This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of 
the original documents submitted by the appHcant. 

Defects in the images may include (but are not limited to): 

• BLACK BORDERS 

• TEXT CUT OFF AT TOP, BOTTOM OR SIDES 

• FADED TEXT 

• ILLEGIBLE TEXT 

• SKEWED/SLANTED IMAGES 

• COLORED PHOTOS 

• BLACK OR VERY BLACK AND WHITE DARK PHOTOS 

• GRAY SCALE DOCUMENTS 



IMAGES ARE BEST AVAILABLE COPY. 



As rescanning documents will not correct images, 
please do not report the images to the 
Image Problem Mailbox. 



REMARKS 

Applicants submit concurrently herewith a Revocation and Power of Attorney 

and a Change of Correspondence Address for Application. The Examiner's attention is 

invited to the Notice mailed February 26, 2004, which incorrectly references the Attorney 

address and docket number. The correct Attomey address is as follows: 

Jones Day 

222 East 41'^ Street 

New York, New York 10017 

The correct attomey docket number is 1 1202-004-999. The corrected information is reflected 
on the Revocation and Power of Attomey and Change of Correspondence Address forms 
submitted herewith. 

Applicants also request that the Petition For Extension of Time for three 
months from September 4, 2003 up to and including December 4, 2003, the an Amendment 
Fee Transmittal Sheet, the Supplemental Information Disclosure Statement, the List of 
References Cited By Applicant, the copies of references BN to BT, and the Substitute 
Sequence Listing in paper and computer readable form all previously submitted with the 
response filed on December 4, 2004 be considered by the Examiner. 

Claims 21-29 were pending in the instant application. By this amendment, 
claims 7-20 have been canceled without prejudice to Applicants' right to pursue the subject 
matter of the canceled claims in this application or other related applications. Claims 23, 25, 
27 and 29 have been amended, and new Claims 30-39 have been added to clarify the 
invention. No new matter is added. 

1. OBJECTION TO THE SPECIFICATION 

The specification is objected to as not complying with 37 C.F.R. L821(d) of the 
Sequence Rules and Regulations. Li response, Applicants have amended the specification to 
add new SEQ ID NO: to the sequence listing and to add recitation of the new sequence 
identifiers to the specification where appropriate. No new matter is added. 

6 

NYJD: 1513702.1 



A substitute Sequence listing in paper form and in computer readable (compact disc) 
form are submitted concurrently herewith. In accordance to 37 C.F.R. L821(f), Applicants 
submit that the sequence listing information recorded in computer readable form is identical 
to the paper form of the Sequence Listing. 

2. OBJECTION TO THE CLAIMS 

Claims 23-24 are objected to by the Examiner for their dependence upon 
canceled claims. Applicants submit that claim 23 contained a typographical error and have 
amended claim 23, thus, rendering the objection to claims 23 and 24 moot. 

Claims 25-27 are objected to for improper multiple dependency. In response, 

Applicants have amended claim 25 to be dependent on any one of claims 21 or 22. Claim 27 

has been amended to depend from claims 21 or 22. Claim 29 has been amended to depend on 

claim 4. New claims have been added to cover the subject matter canceled as a result of the 

amendment of claims 25-27. No new matter is added. 

In view of the amendments made herein, it is submitted that the objections are 
avoided and moot. 

3. THE REJECTION UNDER 35 U.S.C. § 112, FIRST PARAGRAPH, FOR 
LACK OF WRITTEN DESCRIPTION SHOULD BE WITHDRAWN 

Claims 23 and 24 are rejected under 35 U.S.C. § 1 12, first paragraph, as 
containing subject matter which was not described in the specification in such a way as to 
reasonably convey to one skilled in the relevant art that the inventors, at the time the 
application was filed, had possession of the claimed invention. The Examiner contends that 
the specification does not provide written description support for substituting functions of 
hgro-1 polynucleotide for functions of C. elegans gro-1 since it was not demonstrated that 
fragments of hgro-1 gene can substitute for the C elegans gene in the rescue of the e2400 
mutant phenotype. The Examiner alleges that the specification did not contemplate a 

7 

^fYJD: 1513702,1 



subregion of gro-1 or hgro-1 that would rescue the e2400 phenotype. 

The criteria for determining sufficiency of written description set forth in 

Guidehnes for Examination of Patent AppHcations Under the 35 U.S.C. 1 12 T| 1, "Written 

Description" Requirement" ("the Guidehnes") (pubhshed in the January 5, 2001 Federal 

Register at Volume 66, Number 4, pages 1099-1 111), specifies that: 

"Whether the specification shows that applicant was in 
possession of the claimed invention is not a single, simple 
determination, but rather is a factual determination reached by 
considering a number of factors. Factors to be considered in 
determining whether there is sufficient evidenced of possession 
include the level of skill and knowledge in the art , partial 
structure, physical and/or chemical properties, functional 
characteristics alone or coupled with a known or disclosed 
correlation between structure and function and the method of 
making the claimed invention ." Id. at page 1106, column 2, 
lines 25-41. 

Where the specification discloses any relevant identifying characteristics, i.e., 
physical, chemical and/or fianctional characteristics, sufficient to allow a skilled artisan to 
recognize the applicant was in possession of the claimed invention, a rejection for lack of 
written description under Section 112, first paragraph, is misplaced. 

Furthermore, in accord with the Written Description Guidelines, what is 
conventional or well known to one of skill in the art need not be disclosed in detail and where 
the level of knowledge and skill in the art is high a written description questions should not 
be raised (Fed. Reg. Vol, 66, no. 4, January 5, 2001, p. 1 106). 

Applicants submit that the specification disclosed a C. elegans gene consisting 
of 9 exons spanning 2 kb which was identified by use of a functional complementation assay 
based on C. elegans gro-l (e2400) mutants. Although the cosmid clones identified by 
Applicants apparently contained the gro-1 gene, one of skill in the art would recognize that 
Applicants contemplated (i) cosmid clones and subclones that contain only a fragment of the 
gro-1 gene, and (ii) using a functional assay that can identify gro-1 fragments which retain 



8 



NYJD: 1513702.1 



functions of the gene. The skilled person and Applicants would have understood that the 
cosmid clones contained random fragments of the C. elegans genome, and that one end of a 
cosmid or its subclones may lie within the gro-1 gene. 

Moreover, the specification disclosed that the human gro-1 gene was obtained 
by assembling fragments of hgro-1 nucleotide sequences that were identified by Applicants 
based on their sequence homology to the C. elegans gro-1 sequence (see specification at page 
14, lines 1-28). Thus, fragments of human gro-1 gene and their homology with gro-1 
sequences of other species, such as yeast and E. coli were disclosed in the specification as 
filed. Given the description of use of a functional assay to test gro-1 activity, and the 
sequence homology and the presence of the zinc finger motif in both human and C. elegans 
gro-1 (see Figure 9, page 14, lines 23-28), one of skill in the art would have understood 
Applicants to be in possession of the claimed invention at the time of filing. As pointed out 
in the previous Amendment, the level of skill in the art at the time of filing included C. 
elegans rescue assays where cDNAs from non-C elegans species were introduced into 
mutant nematodes to rescue mutant phenotypes (see References BL and BM). 

Applicants respectfully submit that disclosure of uncharacterized human 
expressed sequence tags coupled with Applicants' teachings of their homology to the C. 
elegans gro-1 provided in the specification and the high level of skill in the art, clearly 
indicate that Applicants were in possession of the claimed invention. 

In light of the foregoing reasoning, the rejection under 35 U.S.C. § 112, first 
paragraph for lack of written description support should be withdrawn. 

4. THE REJECTION UNDER 35 U.S.C. § 101 FOR LACK OF UTILITY 
SHOULD BE WITHDRAWN 

Claims 21-24 are rejected under 35 U.S.C. 101 because the claimed invention 
is not supported by either a specific, substantial and credible asserted utility or a well- 



9 



NYJD: 1513702.1 



established utility. The Examiner alleges that the reference of Golpvko et al. fails to reach a 
utility for the disclosed isopentenyl transferase, and therefore, the instant isopentenyl 
transferase cannot rely on being a member in the class of human isopentenyl transferase for 
utility. The Examiner also asserts that the relation of the instant isopentenyl transferase to 
spontaneous mutagenesis, genome stability and cancer or epigenetic control of gene 
expression is not a credible utility without a specific enablement set forth in the specification. 
Applicants respectfully disagree. 

According to applicable case law, "The threshold of utility is not high: An 
invention is 'useful' under section 101 if it is capable of providing some identifiable benefit 
Juicy Whip, Inc. v. Orange Bang, Inc. 185 F.3d 1364, 51 U.S.P.Q.2d 1700 (Fed. Cir. 1999). 

The claimed invention encompasses polynucleotides that encode a human 
isopentenylpyrophosphate:tRNA transferase ("IPT") or a fragment thereof that exhibits the 
functional activity of the enzyme, and vectors and host cells comprising said polynucleotides. 
One of the utilities of the claimed invention is the making of the highly-conserved functional 
human IPT for use as a test reagent in scientific and/or medical research. 

According to MPEP 2107.01, a "specific utility" is specific to the subject 
matter claimed which contrasts with a general utility that would be applicable to the broad 
class of the invention. Here, the broad class of the invention include any polynucleotides and 
recombinant cells. In contrast, the claimed invention has the specific utility of producing an 
enzyme that catalyzes the transfer of an isopentenyl moiety from dimethylallyl 
pyrophosphate (DMAPP) to the adenosine immediately adjacent 3' to the anticodon of 
tRNAs whose anticodons terminate with uridine, resulting in A^-(A^-isopentenyl)adenosine 
("i'^A"). Applicants point out that the broad class of the invention does not possess this 
specific utility. Hence, the claimed invention met the requirement of having a specific utility. 

MPEP 2107.01 states that a "substantial utiHty" defines a "real world" use, and 
that utilities which require or constitute carrying out further research to identify or reasonably 

10 

NYJD: 1513702.1 



confirm a "real world" context of use are not substantial utilities. As examples, MPEP 
2107.01 indicated that an assay that measures the presence of a material which has a stated 
correlation to a predisposition to the onset of a particular disease condition would also define 
a "real world" context of use in identifying potential candidates for preventive measures or 
further monitoring. 

In the present application, the specification teaches that a mutant gro-1 protein 
led to an altered lifespan and cellular metabolism in the nematode, and that the activity of this 
gene in other animals, such as yeast and in humans, is related to a physiological clock which 
coordinates aspects of cellular physiology, from cell division, growth, to aging. It is known 
in the art that mutations in a bacterial homolog miaA affect cellular growth in many ways, 
such as decreased suppression of nonsense mutation, slowed ribosomal translation, etc. The 
specification also teaches that miaA mutations increase the rate of spontaneous mutations. 
See specification at paragraph bridging pages 18-19, and first full paragraph on page 19. It 
is also known in the art that mutation in the yeast homolog MODS similarly prevented 
suppression of certain nonsense mutations. As discussed in the previous Amendment, 
Golovko et al.* ("Golovko") discloses that the human homolog hgro-1 can complement the 
suppression function of MODS. Applicants submit that one skilled in the art would believe 
that a mutation in this highly conserved gene would increase the rate of spontaneous mutation 
or decrease suppression of nonsense mutation, and that it would reasonably correlate with a 
mechanism that contributes to the formation of cancer. Hence, there is a substantial and 
credible utility in monitoring the expression and mutation of hgro-1 in humans. 

According to applicable case law, applicants do not have to provide evidence 
sufficient to establish that an asserted utility is true "beyond a reasonable doubt." In re Irons, 
340 F.2d 974, 978, 144 USPQ 3S1, 3S4 (CCPA 196S). All that is required in evaluating the 
credibility of an asserted utility is a preponderance of the totality of the evidence under 

^ Golovko et al., "Cloning of a human tRNA isopentenyl transferase". Gene (2000) 258:85-93. 

11 

NYJD: 1513702.1 



consideration. In re Oetiker, 977 F.2d 1443, 1445, 24 USPQ2d 1443, 1444 (Fed. Cir. 1992). 
A preponderance of the evidence exists when it suggests that it is more likely than not that 
the assertion is question is true. Herman v. Huddleston, 459 U.S. 375, 390 (1983). Here, 
Applicants submit that the totality of facts and reasoning suggests that it is more likely than 
not that the statement of the applicant is true. 

The Examiner alleges that disclosure of SEQ ID NO:3 is simply a starting 
point for further research and investigation into. potential practical uses of the claimed nucleic 
acids. Applicants submit that the Examination Guidelines for the Utility Requirement 
("Examination Guidelines") cautioned not to interpret "immediate benefit to the public" to 
mean that products or services based on the claimed invention must be "currently available" 
to the public in order to satisfy the utility requirement. Brenner v. Manson^ 383 U.S., 519, 
534-35, 148 USPQ 689, 695 (1966). Rather, any reasonable use that an applicant has 
identified for the invention that can be viewed as providing a public benefit should be 
accepted as sufficient, at least with regard to defining a "substantial" utility. 

Furthermore, the claimed invention provides more than one utility. According 
to MPEP 2107.01, an assay method for identifying compounds that themselves have a 
substantial utility also define a real world context of use. In the present application, the 
specification discloses that the substrate of IPX is DMAPP which is a precursor of the lipid 
side-chain of ubiquinone in bacteria, and related to synthesis of cholesterol and its derivatives 
in eukaryotes (see page 23, first full paragraph). Applicants invite the Examiner's attention 
to Benko et al.'^ ("Benko"), wherein it is taught that the yeast tRNA biosynthetic pathway and 
the sterol biosynthetic pathway competes for DMAPP, which is the substrate of MOD5 in 
yeast. At the time of Benko's publication, the human homolog of MOD5 (i.e., hgro-1) was 
not known and Benko did not disclose or suggest using hgro-1. Benko discloses an assay that 

^ Benko et al., "Competition between a sterol biosynthetic enzyme and tRNA modification in addition to 

changes in the protein synthesis machinery causes altered nonsense suppression." January 4, 2000, Proc Natl 
Acad Sci USA. 4;97(l):61-6, submitted herewith as reference BN. 

12 

NTYJD: 1513702.1 



is based on using yeast MODS to screen for inhibitors that reduce i^A modification thereby 
affecting the distribution of DMAPP between tRNA synthesis and sterol synthesis. Benko 
indicates that the yeast-based assay can be developed to identify new drugs that can affect the 
pathways of cholesterol synthesis and synthesis of famesyl-pyrophosphate-dereived products 
independently. Applicants submit and one of skill in the art would recognize from the 
specification that hgro-1 produced by the claimed invention can be used additionally to 
screen for therapeutic compounds that interfere with cholesterol biosynthesis. Such a utility 
is well established, specific, has a real world context and is believable to a person of ordinary 
skill in the art as evidenced by the peer-reviewed publication. 

According to the Examination Guidelines for the Utility Requirement 
("Examination Guidelines") Examination Guidelines, if the applicant has asserted that the 
claimed invention is useful for any particular practical purpose and the assertion would be 
considered credible by a person of ordinary skill in the art, the Examiner should not impose a 
rejection based on lack of utility (66 FR 1098, Jan. 5, 2001). Applicants submit that the 
claimed invention satisfies the utility requirements under 35 U.S.C. section 101. 

In the Office Action, the Examiner contends that it appears from the work of 
Ushijima et al. that methylation of a specific area of DNA or a specific group of genes is 
more important than the overall level of DNA methylation in tumors. The Examiner further 
contends that there are contradictions in the cited published literature and that it can be 
concluded that DNA methylation and its relationship to cancer is unreliable. Applicants 
respectfully disagree with these contentions. Even assuming that there are controversies in 
the literature, the asserted utility is not wholly inconsistent with contemporary knowledge in 
the art such that the utility is not credible. Applicants point out that these assertions relate to 
DNA methylation generally and lack specificity with respect to the claimed invention. MPEP 
2107.02 (IV) states that it is imperative that Office personnel use specificity in setting forth a 
rejection under 35 U.S.C. 101. However, these contentions are moot because a patent 

13 

NYJD: 1513702. 1 



applicant need show utility for only one disclosed purpose. See Raytheon Co. v. Roper 
Corp,, 724 F.2d 951, 958, 220 U.S.P.Q. 592 (Fed. Cir. 1983), cerL denied, 469 U.S. 835 
(1984); Ex parte Lanham, 121 U.S.P.Q. 223 (Pat. Off. Bd. App. 1958). In view of the 
evidence and reasoning provided in the foregoing paragraphs. Applicants have showed more 
than one disclosed utility that meets the utility requirements. 

As such, Applicants respectfully request that the rejection under 35 U.S.C. 101 

be withdrawn. 

5. THE REJECTION UNDER 35 U.S.C. § 112, FIRST PARAGRAPH, FOR 
LACK OF ENABLEMENT SHOULD BE WITHDRAWN 

Claims 21-24 are also rejected under 35 U.S.C. 1 12, first paragraph. The 
Examiner contends that since the claimed invention is not supported by either a credible, 
specific and substantial asserted utility or a well established utility for the reasons set forth 
above, one skilled in the art clearly would not know how to use the claimed invention. 

Applicants traverse this rejection on the ground that claims 21 to 24 has 
significant patentable utility as discussed in the Section above. When an Applicant 
satisfactorily rebuts a rejection based on a lack of utility under 35 U.S.C. § 101, the 
corresponding rejection imposed under 35 U.S.C. § 1 12, first paragraph, should also be 
withdrawn. Thus, Applicants respectfully request that the rejection of claims 21-24 under 35 
U.S.C. § 1 12, first paragraph, be withdrawn. 

6. THE REJECTIONS UNDER 35 U.S.C, § 102(b) FOR ANTICIPATION 
SHOULD BE WITHDRAWN 

Claim 22 is rejected under 35 U.S.C. § 102(b) as being anticipated by Hudson 
("Hudson", Accession number G24438, May 31, 1996). Applicants respectfully disagree. 

Anticipation under 35 U.S.C. § 102 requires identity of invention. The court 
made it absolutely clear that ^'anticipation requires that all of the elements and limitations of 



14 



NYJD: 1513702.1 



the claim are found within a single prior art reference [and] ... [tjhere must be no difference 
between the claimed invention and the reference disclosure, as viewed by a person or 
ordinary skill in the field of the invention." Scripps Clinic & Research Fdn, v. Genentech 
Inc., 927 F.2d 1565, 1576 (Fed. Cir. 1991). 

Claim 22 is drawn in part to a complement of a polynucleotide that encodes a 
polypeptide encoded by SEQ ID NO:3, which polypeptide is the human homolog of the C. 
elegans GRO-1 protein (hgro-1). The amino acid sequence of hgro-1 is depicted in Figure 9 
and is now assigned new SEQ ID NO: 63. In contrast, Hudson discloses human STS WI- 
12773 which is a complement to only residues 1778-2029 of SEQ ID NO:3. Hudson does 
not disclose a polynucleotide that encode hgro-1, and thus, does not anticipate claim 22. 
This rejection is in error, and should be withdrawn. 

Claims 23-24 are rejected under 35 U.S.C. § 102(b) as being anticipated by 
Bonaldo et aL (1996, Genome Research, 1996, vol.6(9), 791-806, "Bonaldo") as evidenced 
by the sequence database entry accession number BM721352. The Examiner alleges that 
Bonaldo teaches Homo sapiens cDNA clone UI-E-EOl which comprises residues 1121-1210 
of SEQ ID NO:3. The Examiner admits that Bonaldo does not disclose that said clone 
encodes a fragment comprising a zinc finger motif, and does not disclose that said fi-agment 
can rescue the e2400 phenotype, but the disclosed polynucleotide allegedly meets the 
required limitation of comprising residues 1 121-1210 of SEQ ID NO:3, therefore it is 
reasonable to assume that it would have the same inherent properties of rescuing the e2400 
phenotype as claimed. Applicants respectfiilly disagree. 

According to the FEATURES section of sequence entry BM721352, it is 
stated that UI-E-EOl is a normalized cDNA library (not a cDNA clone as alleged by the 
Examiner) constructed according to the method taught in Bonaldo. Contrary to the 
Examiner's allegation, Bonaldo does not disclose human cDNA library UI-E-EOl (see 
Bonaldo attached hereto as Exhibit A, in particular Table 1 on page 793), and does not 

15 

NYJD: 1513702.1 



disclose any nucleotide sequence of cDNA clone. Thus, Bonaldo does not anticipate claims 
23 and 24. The rejection is in error. 

Applicants respectfully points out that the sequence entry BM721352 
corresponding to clone UI-E-EOl-aib-b-20-O-UI was created on March 1, 2002. As such, the 
public disclosure of the nucleotide sequence in BM721343 postdates February 25, 2000, 
which is the priority date of the present application. Therefore, sequence entry BM721352 is 
not prior art to the claimed invention and the rejection is in error. 

In view of the foregoing, the rejections under 35 U.S.C. § 102(b) should be 

withdrawn. 

CONCLUSION 

Applicants respectfully request that the foregoing amendments and remarks be 
made of record in the file history of the instant application. Applicants believe that the 
remarks and amendments made herein now place the pending claims in condition for 
allowance. 



Date: March 23,2004 L^^^^-u^ Av -0</>i^ 30,742 



Respectfully submitted, 

Laura A. Coruz^ / S (Reg. No.) 

By: T. Christopher Tsang (Reg. No.) 

Jones Day 
222 East 41 Street 
New York, New York 10017 
(212) 790-9090 



40,258 



16 



NYJD: ISI3702.I 



1 



RESEARCH 

Normalization and Subtraction: Two 
Approaciies to Facilitate Gene Discovery 

Maria de Fatima Bonaldo,^ Gregory Lennon/ and 
Marcelo Bento Scares''^ '' 

.rw.„,r,m»nl of Psvchiativ College oi Physicians and Surgeons o( Columbia University, and ^The New 
°ITC CchiaSsti.u.e' New vL, New York ,0032; 'Hu.an Cenome Cenrer. Lawrence 
Livermore National Laboratory, Livermore, California 94551 

-Large-scale sequencing of cDN As randomly picked from libraries has proven co v-y pow^^^^^^^^^ 
to discover (putativety) expressed sequences that, m turn once mapped may great y exp ^^^^ P^^ 
involved in the identification and donmg of .^l""]^" t^^^^/,^^ ^^^f//,^;^^^ Z cWA libraries that are 
pace at which novel sequences can be .dentmed depends to a ^^^^^ ^^^"^j^J^ ^^^^^ frequency classes 

used. Because altogether, in a typical ^^'V'^^Sa but re^re^^^^^^ 1000-2000 different 

comprise as much as 50-65% of the tota mRNA mas^^^^^^^^ ^^^.^^^ ^^^^^^ 

mRNAs, redundant identification of m^NAs of ^^"^ /^^^^^^^ ,hus seriously compromising 
overwhelming relatively early in ^"V ""f^^^f "J^^ X ^ Su^ly we developed a method to 
their cost-effectiveness. With the goal °f ^^5.'^" ^"^^^ ^^3^^^^^^^^^ infant brain (INIB) and 

construct directionally cloned normalized cDN A I.b^ ^ 

fetal liver/spleen (INFLS) libraries, from which . f our libraries, 
respectively, have been derived. While '"^P^°^'"^.;^^^^^XiS^^^^ gen ratfd over 35 libraries, most of 
we developed three addidonal methods to norm^^^^^^^^ j^,,, E,p,ession 

which have been contributed to our Integrated ^/'^^^^^^ mapping. In an attempt to 

(IMAGE) Consortium and thus distributed widely and used f""" ff^''"';''"" approach 
adlitate the process of gene discovery further, -.1^-^^,;';°/ 1^ of arrayed and 

designed specifically to eliminate (or ^f^^^^^.^^^;^^^^^^^^^^ Here we present a 

(mostly) sequenced clones from "°T'''''^''''.''TfIf/lLds tha '"'^ "''"^ '° 

detailed description and a 'o^l^'^'^^^'^^^^^^^ ScMstoson^a mnsoni (I). In 

normalize cDNA libraries from human (15), "^^'" '^''''^^^^^^^ of a subtracted liver/spleen library 

addition, we describe the construction and P/^f Xln of r p™^^^ 
(INFLS-Sl) that resulted from the elimination (or reduction ot representacionj 

from the INFLS library. 



Large-scale single-pass sequencing of cDNA 
clones randomly picked from libraries has proven 
to be a powerful approach to discover genes (Ad- 
ams et al. 1991, 1993a,b, 1995; Khan et al. 1992; 
McCombie et al. 1992; Okubo et al. 1992; Mat- 
subara and Okubo 1993; see also HiUier et al., this 
issue). However, the significance of using cDNA 
libraries that are well suited for this purpose 
should not be underestimated (Adams et al. 

1993b). . . , 

Ordinary cDNA libraries may contam a higti 
frequency of undesirable ("junky") clones (Ad- 
ams et al. 1991, 1992) that may not only drasti- 



;SC«l«r«;<o.umbU..d (212) 78,.JS77. 



cally impair the overall efficiency of the ap- 
proach, but also seriously compromise the mteg- 
rity of the data that are generated. Among such 
junky clones are: (1) clones that consist exclu- 
ively of poly(A) tails of mRNAs; (2) c ones tha 
contain very short cDNA inserts; (3) dones th t 
contain nothing but the 3' half of the No I- 
oligo(dT),8 primer used for synthesis of first- 
stand cDN'.Migated to an adaptor; and (4) chi- 
rieric clones, i.e., cDNAs derived f-m different 
mRNAs joined artifactually during ligation^ Fur- 
thermore, given that, as a general ru e the fre- 
quency of occurrence of a cDNA clone In a library 
is equivalent to that of its corresponding mRNA 
In the cell, even high-quality cDNA libraries may 
not be ideal for large-scale sequencing. 



e■,<>^^06 Cm6 by Cdd Spring Harbor Uborato^ Pr«s ISSN ,054-9803/96 JS.OO 



GENOME RESEARCH •791 



BONALDO ET AL. 

Reassociation-kinetics analysis indicates that 
the mRNAs of a typical somatic cell are distrib- 
uted in three frequency classes: (1) superpreva- 
lent (consisting of about 10-15 mRNAs that alto- 
gether represent 10-20% of the total mRNA 
mass); (2) intermediate (1000-2000 mRNAs; 40- 
45%); and (3) complex (15,000-20,000 mRNAs; 
40-45%) (Bishop et al. 1974; Davidson and Brit- 
ten 1979). Accordingly, once most mRNAs of the 
prevalent and intermediate frequency classes are 
identified, redundancy levels are expected to be- 
come greater than 60%. For this reason, the use 
of normalized libraries, in which the frequency of 
all clones is within a narrow range (Scares et al. 
1994), has been shown to be beneficial for large- 
scale sequencing (Berry et al. 1995; Houlgatte et 
al. 1995). Calculations show that at Qt = 5.5 
(where Cq is the total DNA concentration and t is 
the time (moles nucleotides per liter x sec)], of 
the three kinetic classes of mRNAs, the most 
abundant species are diminished drastically, 
while all frequencies are brought within the 
range of one order of magnitude (Soares et al. 

1994). , „ ^ 

However, because a large fraction of all hu- 
man genes has been identified already, redun- 
dant identification of genes that are expressed in 
multiple tissues cannot be avoided simply by the 
use of normalized libraries. Hence, we argue that 
the use of subtractive cDNA libraries enriched for 
genes expressed at low levels and that have not 
yet been identified should become increasingly 
more advantageous for large-scale sequencing 
programs. 

While attempting to improve the representa- 
tion of the longest cDNAs in our libraries, we 
developed three methods for construction of nor- 
malized libraries, in addition to the procedure 
that we described previously (Soares et al. 1994), 
and used them successfully to generate normal- 
ized cDNA libraries from human (15), mouse (3), 
rat (2), and Schistosoma mansoni (I) tissues. All 
human and mouse cDNA libraries have been con- 
tributed to the Integrated Molecular Analysis of 
Genomes and Their Expression (IMAGE) Consor- 
tium (Lennon et al. 1996), and to date a total of 
315,408 expressed sequence tags (ESTs) have 
been derived from these libraries (dbEST release 
052396; http://www.ncbi.nlm.nih.gov). 

Here we present a detailed description and a 
comparative analysis of the four methods that we 
have developed to normalize cDNA libraries; we 
describe a simple procedure for the construction 
of subtractive cDNA libraries; and we discuss 



strategies that take advantage of subtractive hy- 
bridization to expedite the ongoing IMAGE/ 
Washington University/Merck gene discovery 
program. 

RESULTS 

While attempting to improve the represoiuation 
of the longest cDN.As in our normalized libraries, 
we developed four methods and constructed over 
35 libraries, most of which are described here. A 
list comprising 15 human, three mouse, two rat, 
and one schistosome library with their respective 
names, number of recombinants, sequence tags, 
and methods used for normalization and prepa- 
ration of single-stranded plasmids is shown in 
Table 1. 

Extensive characterization of two normalized 
libraries [normalized infant brain (INIB) and nor- 
malized fetal spleen (INFLS)) constructed accord- 
ing to our previously described procedure (Soares 
et al. 1994; here designated as method 1) con- 
firmed our original observations that a great ex- 
tent of normalization can be achieved with this 
method for most cDNA species (e.g., cf. lanes 
9,10 in Fig. IM-P). It is noteworthy that the fre- 
quency of cDNA 122 (used as the probe in P) was 
increased with normalization from <0.0006% in 
the starting library to 0.007% jn the INIB library 
(Soares et al. 1994). However,' Southern hybrid- 
ization of starting and normalized libraries with a 
battery of cDNA probes revealed that on occasion 
truncated clones were favored over their longest 
counterparts during the process. This was first ob- 
served when Southern blots of NofI + Hi«dIII- 
digested plasmid DNA from starting and normal- 
ized infant brain libraries were hybridized with a 
cDNA probe for mitochondrial 16S rRNA (see Fig 
IL lanes 9,10). Not only was the frequency of 
these mitochondrial cDNA clones not reduced ef- 
fectively during the process of normalization (fre- 
quency of occurrence in starting and normalized 
infant brain libraries was 1.4% and 1.0%, respec- 
tively), but also the length of the hybridizmg cD- 
NAs was noticeably smaller in the normalized li- 
brary.- Comparative sequence analysis (not 
shown) of a number of hybridizing mitochon- 
drial 165 rRNA clones from both starting and nor- 
malized libraries revealed that whereas the 3 end 
of most cDNAs derived from the starting library 
corresponded to the bona fide 3' end of the 165 
rRNA the 3' end of the majority of the cDNAs 
isolated from the normalized library corre- 
sponded to sequences further upstream on the 



792 O GENOME RESEARCH 



cDNA^BASED APPROACHES TO FACILITATE GENE DISCOVERY 



Table 1. Complete List and Main Features of the Normalized Human, SVIouse, Rat, and 
Schistosome cDNA Libraries 



mRNA source 



Normalized 
library name 



Number of 
recombinants 
in the 
normalized 
library 



Preparation 
of single- 
stranded 
plasmids 



Method of 
normal- 
ization 



Library 
tag' 



Human infant brain" 
Human fetal liver spleen" 



Human term placenta 

Human 8-9W placenta 

Human breast"* 

Human adult brain' 

Human retina^ 

Human pineal gland| 

Human ovary tumoH 

Human melanocytes'" 

Human fetal heart' 

Human parathyroid adenoma'" 

Human senescent figroblast" 

Human multiple sclerosis plaques** 

Human fetal lung' 

19.5-dpc mouse embryo^ 

1 7.5-dpc mouse embryo^ 

13.5- to 14.S-dpc mouse embryos^ 

Rat heart'' 

Rat kidney" 

S-week-old adult schistosome' 



INIB 

Nb2HFLS20W(lNFLS) 

5Nb2HFLS20W 

6Nb2HFLS20W 

14Nb2HFLS20W 

15Nb2HFLS20W 

Nb2HP 

2NbHP8-9W 

2NbHbst-3NbHBst* 

N2b4HB55Y-N2b5HB55Y9 

2N2b4HR-N2b5HR 
3NbHPC 
NbHOT 
2NbHM 
NbHH19W 
NbHPA - 
NbHSF 
2NbHMSP 
NbHLI 9W 
p3NMF19.5 
NbM£17.5 
NbME13.5-14.5 
NbRH 
2NbRK 
NbSSW 



2,500,000 
} 9,000,000 
3,200,000 
1,400,000 
3,200,000 
35,000,000 
750,000 
1 00,000 
2,090,000 
3,170,000 
1,600,000 
1,000,000 
1,100,000 
6,800,000 
9,700,000 
3,400,000 
9,900,000 
1,100,000 
21,700,000 
3,400,000 
6,800,000 
380,000 
400,000 
1 30,000 
1,000,000 



in vivo 
in vivo 
in vitro 
in vitro 
in vitro 
in vitro 
in vivo 
in vitro 
in vivo 
in vivo 
in vivo 
in vitro 
in vivo 
in vitro 
in vitro 
in vitro 
in vitro 
in vitro 
in vitro 
in vitro 
in vitro 
in vitro 
in vitro 
in vitro 
in vitro 



ACCAA 
ACATCT 



2-1 




2-3 




4 




2-2 




2-1 . 


ACCAA 


2-3 


CA 


2-1 


CC 


2-1 


CC 


2-1 


AC 


2-1 


CC 


2-1 


CC 


2-3 


AC 


4 


ATC 


4 


ACCAA 


4 


AACCA 


3 


CA 


4 


AA 


4 


ACAAC 


4 


CACAC 


4 


CCAAA 


4 


ACAAC 


4 


CAAAC 


4 


CAAAG 



^::-^:x::^?XP^ - - ^ 

CarcS^rpo";;r;'rA Iron, no... poo,ed .on, re<Juc«on .a.^ tissue was .nd, pro.ded by D. Anno Bowcoc. and Ms. 

Mon que Spillman, university of Texas Southwestern respectively) 

•ZNbSbst differs from JHbHbst in the Cot used for hybnd,zat,on j2J7 a^ ^^.^^^^^ , """r.nd'riaht 

s-!rrr3ier^^^^^^^^^^ 

^Vo«l norma, human retina RMA (.ndiy provided by or. «odericK R. Mcinne. 

was obtained from a 55-year^ld Caucasian male. ^ ^^^^^ Development, National Inst.tutes of Heaim 

'Human pineal gland [kindly provided by Dr. Oav.d Kle.a ^ '^^^ '^^^^^^^^^^^^ ^ale: gland 2: IS-year-old Caucas.an female, gland 3. 

(NIH)l was derived from a group of three pineal glands (glana i. -to yea ^ . k t 

20.year^ld African American male). Bowcock and Ms. Monique Spillman, University of Texas 

"ss„ ^> - ^ - - """" "^"""'TTl 

^otal cellular RNA from muluple scleros.s plaques (kindly proviaea oy , ^ , University 

from one patient. . ^^^3,,) .^bryonic libraries was kindly provided by Dr. Minoru Ko (Wayne State Un.vers.ty, 

•-Total cellular RNA for construction of the mouse (C5 7BU6I strain; emory Presbyterian 

Cleveland, OH. GENOME KhbbAUCH » ^ !^ 



V- 



BONALDO ET AL. 

Figure 1 Comparative analysis of starting and normal- 
ized cDNA libraries by Southern hybridization with 14 
cDNA probes. The 0.015 ^.g Pod + fcoRI digested plas- 
mid DNA from the starting fetal liver/spleen libra^ (lane 
6) from the normalized fetal liver/spleen libraries con- 
structed according to method 2-1 (lane 1), method 2-3 
(lane 2), method 2-2 (lane 3), method 1 (lane 4), method 
4 (lane J), and from the liver/spleen mini-libraries en- 
riched for abundant cDNAs (HAP-bound fractions) gener- 
ated with method 2-1 (lane 7) and method 4 (lane 8) 
were electrophoresed on 1% agarose gels, transferred to 
nylon membranes (GeneScreenPlus; DuPont/NEN) and 
hybridized at AI'C in 50% formamide, 5 x Denhardt's 
solution, 0.75 m NaCI, 0.1 5 m Tris (pH 7.5), 0.1 m sodium 
phosphate, 0.1% sodium pyrophosphate, 2% SDS con- 
taining sheared and denatured salmon sperm DNA at TOO 
p.g/ml. Similarly, 0.05 ^ig Not\ + H/ndlll digested plasmid 
DNA from the starting (IB; lane 9) and normalized (1 NIB; 
lane JO; method 1) infant brain libraries (Soares et al. 
1 994) were electrophoresed, transferred, and hybridized 
as described above. Radioactive probes were prepared by 
random primed synthesis using the Prime-it II kit (Strata- - 

specified above. 




165 rRNA. The occurrence of such 3' truncations 
was also documented by sequence analysis (not 
shown) for serum albumin cDNAs in the fetal 
liver/spleen library (see Fig. 1D,E, lanes 4,6). 

Reasoning that this problem could be cir- 
cumvented if the fragments used in the hybrid- 
ization with the single-stranded circles (1) were 
in excess, and (2) spanned the entire length of 
the cDNAs, we developed an alternative proce- 
dure to normalize cDNA libraries based on hy- 
bridization of in vitro synthesized RNA (driver) 
from an entire library with the library itself in the 
form of single-stranded circles (tracer) (see meth- 
ods 2-1 and 2-2 in Fig, 2). Several normalized li- 
braries were generated by this procedure (see 
Table 1). 

Southern hybridization of endonuclease- 
restricted plasmid DNA from starting and nor- 
malized libraries with a number of cDNA probes 
(Fig. 1) indicated clearly that these methods ef- 
fectively improved the representation of the 
longest cDNAs in the normalized libraries (e.g., 
cf. lanes 1,4 in Fig. 1A,D,E,G,H). However, charac- 
terization of one of these libraries (5Nb2HFLS20W) 



by colony hybridization with cDNA probes (not 
shown) indicated that this approach was effective 
to reduce the frequenc>* of some, but not all, of 
the most abundant clones (e.g., serum albumin 
was reduced about 20-fold, whereas 7-globin was 
reduced only twofold). No difference was ob- 
served when hybridizations were performed at 
different conditions [0.4 m NaCl and 50% for- 
mamide at 42°C as in methods 2-1 and 2-3; 0.12 
M NaCl, 50% formamide, and 1% sodium dodecyl 
sulfate (SDS) at 30°C as in method 2-2 (see lane 3 
in Fig. 1); 0.4 m NaCl and 80% formamide at 
42''C, not shown]. 

It is noteworthy that Northern hybridization 
(not shown) of in vitro transcribed RNA synthe- 
sized from an entire plasmid library with probes 
derived from the abundant cDNAs that failed to 
be normalized effectively by this procedure (e.g., 
globins in the fetal liver/spleen library and glyc- 
eraldehyde-3-phosphate dehydrogenase (G3PD) 
in the breast library) indicated that they were not 
as prevalent in the population of in vitro tran- 
scribed RNAs as they were in their respective 
starting cDNA libraries. 



794 « GENOME RESEARCH 



cDNA-BASED APPROACHES TO FACILITATE GENE DISCOVERY 



1 



Convert to 
single-strandod 
cirdes wtth Gene II 
and Exo III 




HAP 
Bound 

i 

Convert to double- 
stranded circles 

i 

Electroporate into DHtOB 

Double-stranded plasmid 
mini-library enriched tor 
abundant RNAs 



in vitro 
trar^scripUon 



Convert to double- 
stranded circles 



Electroporate into DHlOB 
(amplified normalized library; 
Methods 2-1 and 2-2) 




Blocking 
cngonucleotides 



HAP 
Row-through 

Convert to double- 
stranded circles 

i 

Electroporate into DH10B 
(amplitied nomialized Ubrary; 
Method 2-3) 



sinqle-stranded circles are purified by HAP chromatography, 
ated into DHlOB (Life Technologies), and propagated under 
malized library (method 2-3). 



Figure 2 Diagram of the normalization 
methods 2-1, 2-2, and 2-3. Double- 
stranded plasmid DNA representing an en- 
tire starting library is (1) linearized with ei- 
ther S//1, NotI, or Pod and used as template 
for synthesis of RNA in vitro using T3 or T7 
RNA polymerases, and (2) converted to 
single-stranded circles either in vivo, upon 
electroporation into DHSaF' and superin- 
fection with M13K07, or in vitro by the 
combined action of Gene II and Exonude- 
ase III (Life Technologies). Single-stranded 
plasmid DNA is HAP-purified and hybrid- 
ized (Cot - 5) with excess RNA (pretreated 
with RNase-free DNAse i; Promega), 
blocked with appropriate oligonucleotides 
to prevent hybridization through common 
vector sequences (see Methods section). 
Both the fraction that remains single- 
stranded (flow-through) as well as the re- 
sulting hybrids (bound) are purified by HAP 
chromatography. The HAP flow-through 
fraction is converted to double-stranded 
ptasmids, electroporated into DHlOB bac- 
teria (Life Technologies), and propagated 
under ampicillin selection to generate an 
amplified normalized library (methods 2-1 
and 2-2, depending on the conditions used 
for hybridization; see Methods section). 
The HAP-bound fraction is also converted 
similarly to double-stranded plasmids, elec- 
troporated into bacteria, and propagated 
under ampicillin selection to generate a 
mini-library enriched for abundant cDNAs. 
Double-stranded plasmid DNA from this 
mini-library is linearized and used as tem- 
plate for synthesis of RNA in vitro. After di- 
gestion of the plasmid DNA template with 
ribonuclease-free DNAse I (Promega), the 
RNA (driver) is blocked with appropriate 
oligonucleotides and hybridized (Cot 
-100-200) with HAP-purified single- 
stranded plasmids derived from the start- 
ing library (see above). The remaining 
converted to double-stranded circles, electropor- 
ampicitlin selection to generate an amplified nor- 



A significantly improved extent of normal- 
ization was achieved when runoff RNA synthe- 
sized from the plasmid mini-library enriched for 
abundant cDNAs (hydroxyapatite (HAP)-bound 
fraction of method 2-1 in Fig. 2) was hybridized 
(Cot = 100-200) with single-stranded circles from 
the starting library (see method 2-3 in Fig. 2 and 
Table 1; cf. lanes 1,2 in Fig. 1A-D,F,G). 

In an effort to preserve the positive charac- 



teristics of both methods 1 and 2 (i.e., the ad- 
equate extent of normalization achieved with 
method 1, and the Improved representation of 
the longest cDNAs achieved with method 2), we 
developed two additional reassociation kmetics 
based procedures involving DNA-DNA hybrid- 
ization (methods 3 and 4; see Fig. 3). 

Method 3, which was successfully usee to 
construct a normalized library from multiple 



GENOME RESEARCH O 795 



BONALDO ET AL. 

sclerosis plaques (see ZNbHMSP in Table I), in- 
volved hybridization of a 20-foid excess of single- 
stranded cDNA fragments (comprising the 5' 
halves of all inserts of the starting library, gener- 
ated by Exonuclease III digestion of gel-purified 
double-stranded cDNAs; see Fig. 3) with comple- 
mentary single-stranded circles produced in vitro 
by the combined action of Gene 11 and Exonucle- 
ase III (Life Technologies). 

Southern hybridization of Notl + £coRl- 
digested plasmid DNA from the starting and nor- 
malized (With methods 2-1 and 3) multiple scle- 
rosis plaques library with mitochondrial 16S 
rRNA and myelin basic protein cDNA probes (not 
shown) clearly indicated that method 3 was su- 
perior to method 2-1 in that a much greater ex- 
tent of normalization was achieved, at the same 
time that it maintained (similar to method 2-1) 
appropriate representation of the longest cDNAs 

in both cases. 

For the libraries constructed with method 4 
(see Table 1 and Fig. 3), double-stranded cDNA 
inserts generated by the polymerase chain reac- 
tion (PCR) with T3 and T7 primers were melted 
and hybridized (in the presence of vast excess of 
blocking oligonucleotides) with single-stranded 
plasmid library DNA prepared in vitro. 

Southern hybridization of Pad + £coRl- 
digested plasmid DNA from starting and normal- 
ized (with methods 1, 2-1-2-3, and 4) fetal liver/ 
? spleen libraries (Fig. 1) with several cDNA probes 
• (including those that revealed incomplete nor- 
} malization with methods 2-1-2-3, such as o-glo- 
. bin p-globin and ^-globin) demonstrated the ef- 
l fica'cy of method 4 in achieving the desired ex- 
I tent of normalization obtained with method 1 
I (cf lanes 1-6 in Fig. lA-D, F-H, and lanes 3-6 in 
I Fig ll-K) while preserving the representation of 
I the longest cDNAs (e.g., the longest albumin 
3 cDNA was present in the normalized library pre- 
1 pared with method 4, shown in lane 5 of Fig^ 
1 ID E but it was undetectable in the normalized . 
: library constructed with method 1, shown in 
i lane 4; a similarly remarkable difference was re- 
' vealed with the cDNA probe for H19 RNA, shown 
in Fig 1G,H). Characterization of the i\ormalized 
library generated with method 4 by colony hy- 
bridization with 10 cDNA probes (not shown), 
which occur at a wide range of frequencies in the 
starting library, confirmed the effectiveness of 
the procedure to narrow their frequencies down 
to within one order of magnitude (e.g., the fre- 
quencies of the cDNAs for -y-globin, a-globin. 
p-globin, H19 RNA, and transferrin were reduced 



Oirecttoftatly cloned 
plasntid library 



Method 3 



Excise cONA inserts and 
purify trom cloning vector 



Digest with Exo III 



Method 4 



PGR Amplification 
(T3*T7 primers) 



Purification Irom 
primers 




Hybridization 



Hybridization 
Vot - 5) 



Row-through Bound 

V 

Convert to double-stranded cir^es 
and elecroporate into OHIOB 



amplified normalized 
library 



mini-library enrKhed 
for abundant mRNAs 



Fiqure 3 Diagram of the normalization methods 
3 and 4 In method 3 double-stranded plasmid DNA 
from a starting library is digested with restriction 

enzymes that generate 5' P^^Y""^'"^ h '^rn m the 
the excised cDNA inserts are get-punf.ed from the 
cloning vector and digested with Exonuclease 111 
to vield noncomplementary single-stranded frag- 
ments each representing half of a cDNA insert^ 
Note that the single-stranded fragments that 
^pan the 5' half (but not the 3' halO of the cDNA 
iLrts are complementary to s.ng e-stranded plas^ 
mids prepared in vitro. These single-stranded DNA 
fragments are blocked with appropriate oi.gc,^ 
nucleotides (see Methods) and hybridized w th 
single-stranded library DNA prepared in vitro 
(m/dd/e column). The remaining ^mg e-stranded 
circles are HAP-purified, inverted to double^ 
stranded plasmids, eiectroporated into DH10B 
bacteria (Life Technologies), and propagated under 

ampicillin selection to 9^"^^^^^/ s 
bran^. In method 4, single-stranded library DNA is 
used as template for PCR amplification with T3 

nd T7 primers. PCR-amplified cDN/^ are purged 
from excess primers, melted, and hybndized with 

ngle stranded library DNA in the Presence of 
blocking oligonucleotides. The remaining single- 
stranded cirdes are purified by HAP chrorna^^og- 
raphy, converted to double-stranded pla^mids 
eiectroporated into bacteria, and propag ted 
under ampicillin selection to generate a normalized 
library. 



796 3 GENOME RESEARCH 



cONA-BASEO APPROACHES TO FACILITATE GENE DISCOVERY 



from 9.2%, 6.4"/., 3.6%, 1.8'K., and <0.2'M. to 0.04%, 
002% 001%, 0.1% and 0.1%, respectively). 

In order to assess further the ability of these 
normalization procedures to preferentially re- 
duce the representation of the most abundant 
cDNAs we have performed a comparative se- 
quence analysis (not shown) of 100 clones picked 
randomly from the fetal liver/spleen cDNA li- 
brary normalized with method 4 (14Nb2HFLS20W 
in Table I; HAP-flow-through fraction m Fig. 3), 
and from two fetal liver/spleen mini-libraries en- 
riched for abundant cDNAs (HAP-bound frac- 
tions in Figs. 2 and 3) obtained during H.AP pu- 
rification of the normalized libraries prepared ac- 
cording to methods 2-1 (5Nb2HFLS20W) and 4 
(14Nb2HFLS20W). A number of cDNAs known to 
be prevalent in the starting fetal liver/spleen li- 
brary (e.g., albumin, -^-globin, a-globin, (J-globin 
mitochondrial RNAs, and apolipoproteins A and 
H) were found at increased frequencies in both 
mini-libraries enriched for abundant cDNAs but 
none of them was represented in the sample of 
100 clones from the normalized library. It is note- 
worthy that while 47% of the sequences derived 
from the normalized library were not represented 
in the "all nonredundant" subdivision of se- 
quences of GenBank + EMBL + DDBJ + PDB, the 
majority of the sequences obtained from the 
mini-libraries of abundant cDNAs derived from 
methods 2-1 and 4 (91.4% and 86.9%, respec- 
tively) did have homologous sequences in that 
data base. Furthermore, although 49% of the se- 
quences derived from the normalized library had 
fewer than 10 homologous ESTs in the dbEST 
subdivision of GenBank, most of the sequences 
obtained from both mini-libraries had greater 
than 10 homologous ESTs in the dbEST data base 
(92.5% and 89.7%, respectively, if^ the HAl- 
bound fractions of methods 2-1 and 4). 

With the ultimate goal of facilitating the on- 
going process of gene discovery by large-scale se- 
quencing of cDNA clones picked randomly from 
libraries, we have performed a pilot subtractive 
hybridization experiment to eliminate (or reduce 
representation oO a pool of approximately 5000 
IMAGE Consortium-arrayed cDNA clones (pool 
no 1 LLAM 78-90) from the normalized library 
from which they were derived (INFLS in Table 1). 
PCR-amplified cDNA inserts from pool no 1 were 
melted and hybridized, in the presence of block- 
ing oligonucleotides, with single-stranded plas- 
mid DNA from the INFLS library, prepared in 
vitro. The remaining single-stranded circles were 
purified by HAP chromatography, converted to 



double-stranded plasmids, clectroporated into 
bacteria, and propagated under antibiotic selec- 
tion to generate the subtracted INFLS-Sl library 
(see Fig. 4). Preliminary characterization ot the 
INFLS-Sl library bv Southern hybridization with 

10 cDNA probes (only five are shown; see Fig. 5) 
known to be represented in pool no. 1 indicated 
clearly the effectiveness of the procedure to 
eliminate (or to reduce the representation of) all 

11 cDNA sequences in the INFLS library. A 
BLASTN search of the dbEST division ot GenBank 
(6/12/96) with 3' ESTs obtained from the five 
probes (CDNAS -1, -4, -8, -9, and -10) the hybrid- 
izations of which were not shown m Figure 5, 
revealed the presence of 0, 0. 1, 2. ^'^d 2 f 
sponding ESTs, respectively, from the INFLS li- 
brary, thus indicating that the subtraction was 
successful even for cDNAs that were under- 
represented in the normalized library (a total of 
44 407 3' ESTs have been derived from the INFLb 
library to date). It should be noted that because of 
sequencing failures, some of the clones in these 
arrays may not yet have corresponding ESTs in 
the public data bases. 

It is noteworthy that when we attempted to 
perform the same subtractive hybridization ex- 
periment using, as driver, RNA synthesized in 
vitro from a plasmid DNA preparation of pool no 
1 the results obtained were not satisfactory (not 
shown) in that subtraction could be demon- 
strated for some but not all tested clones (e.g, 
a-Rlobin could not be subtracted effectively), 
similar to what we observed in normalizations 
with method 2-1. 

DISCUSSION 

As a result of an effort to improve the represen- 
tation of the longest cDN.As in "o^^^^f^/^ 
libraries, we have developed four different meth^ 
ods for normalization of directionally cloned 
CDNA libraries constructed in Ph^Semid vectors, 
while contributing resources to the IMAGE Con- 
sortium (Lennon et al. 1996) and thereby facili- 
tating the ongoing gene discovery and mapping 
programs. Appro.ximately 87.5% ot all (hurnan) 
EgE ESTs were derived from the normalized 
libraries described here. 

The normalization procedure (method 1 
that we described previously (Soares et aL 1994) 
was applied for the construction of the INIB 
and INFLS normalized libraries, from which a 
Ltal of 45,192 and 86,088 ^STs^ respect^^^^^^^ 
have been derived (dbEST release 0.2396, http.// 

GENOME RESEARCH ©797 



BONALDO ET AL. 



Double -stranded 
plasmid ONAfrom 

a pool of -5.000 
Cver -spleen clones 

Gene II 
protein 



Oouble-stranded 
plasmid ONA from 
nor mafized Ovef- 
spteon libf aiy 

Gene 11 
protein 




^ Amplification 
^ products 



Blocking 
oligonudeondes 



Hybriclzation 
[Col « 27) 



HAP 
Ovomatography 

HAP-flow-through\subtracted Ubra^ 
enriched for dones not represented in 
the original pool of -5,000 dones) 

; 

Conversion to double- 
stranded ctrdes 

; 

Electroporation 
(amplified subtracted Ubrary: 
^ ^ 'NFLS-Sl) 



Figure 4 Diagram of the subtractive hybridization 
procedure used to generate the 1 NFLS-Sl library. 
Doubie-stranded plasmid DNA from a pool of 
-5000 IMAGE Consortium-arrayed cDNA clones de- 
rived from the 1NFLS library (pool no. 1, LLAM 78- 
90) was converted to single-stranded circles in vitro 
by the combined action of Gene II and Exonuclease 
Itl (Life Technologies). The resulting single-stranded 
plasmids were HAP-purified and used as a template 
for PGR amplification with T3 and T7 primers. PGR- 
amplified cDNA inserts were purified from excess 
primers, melted, and hybridized with single- 
stranded circles (prepared in vitro) from the INFLS 
library, in the presence of appropriate blocking oli- 
gonucleotides. The remaining single-stranded 
circles were purified by HAP chromatography, con- 
verted to double-stranded plasmids, electroporated 
into DHIOB bacteria (Life Technologies), and propa- 
gated under ampicillin selection to generate the 
(1 NFLS-Sl) subtracted library. 



www.ncbLnlm.nih.gov). Data analysis (see 
Hillier et aL, this issue) demonstrated solidly the 
efficacy of this approach in bringing the fre- 
quency of all clones to within a nanow range. 
Extensive characterization of these two libraries 
by Southern analysis, however, revealed that on 




Figure 5 Gharacterization of the 1 NFLS-Sl sub- 
tracted liver/spleen library by Southern hybridiza- 
tion with 5 cDNA probes. The 0.1 5 M.g Pod + fcoRl- 
digested plasmid DNA from the fetat liver/spleen 
library normalized with method 1 (INFLS; lane /), 
from the pool of -5000 IMAGE Gonsortium-arrayed 
cDNA clones derived from the INFLS library (pool 
no. 1, LLAM 78-90; lane I), from the subtracted 
library generated according to the diagram shown 
in Fig. 4 (1 NFLS-Sl; lane 3), and from the HAP- 
bound fraction obtained during HAP purification of 
the 1 NFLS-Sl library (see Fig. 4) were electropho- 
resed, transferred to nylon membranes, and hybrid- 
ized as described in the legend to Fig. 1. The fol- 
lowing cDNA probes were used: a-globin (>\), ^-glo- 
bin (8), serum albumin (Q, unknown cDNA 7 (D; 
picked randomly from pool no. 1 , LLAM 78-90), and 
unknown cDNA 5 (f; picked randomly from pool 
no. 1, LLAM 78-90). A BLASTN searcF} of the dbEST 
subdivision of Genbank with 3' ESTs derived from 
cDNA 7 and cDNA 5 revealed the presence of 33 
and 0 corresponding ESTs, respectively, frorri the 
INFLS library. Ail probes were contaminated inten- 
tionally with a small amount of vector DNA to en- 
able visualization of vector bands and thus confirm 
that a similar amount of library DNA was loaded in 
all lanes. (V) vector band; (U) residual undigested 
plasmid. 



occasion truncated clones were favored over their 
longest counterparts during the normalization 
procedure. 

Because of the relatively permissive condi- 
tions used for synthesis of first-strand cDNA, 
priming with the iVotl-tag-(dT)ia oligonucleotide 
may occur not only at the poly(A) tail of the mR- 
NAs but also at internal A-rich sites within the 
mRNAs (e.g., at Alu tails). Typically, cDNAs with 
3' truncations occur at frequencies of 10-15% m 
directionally cloned libraries. Truncated clones 
can be recognized (tentatively) as such, by the 
absence of a bona fide polyadenylation signal 



798 «^ GENOME RESEARCH 



CDNA^BASED APPROACHES TO FACILITATE GENE DISCOVERY 



sequence at the appropriate distance upstream 
from the ollgo(dA),H tail of the cDNA. 

VVhy may truncated cDNAs be favored over 
their longest counterparts during normalization 
by method 1? Briefly, method 1 (Soares et al. 
1994) involves: (1) annealing of a single-stranded 
DNA preparation of a directionally cloned cDNA 
library with an oligo(dT),» primer; (2) controlled 
primer extension reactions in the presence of de- 
oxynucleotides and dideoxynucleotides to gener- 
ate 3' noncoding extension products of approxi- 
mately 200-300 nucleotides; (3) purification of 
the resulting partially double-stranded circles by 
HAP chromatography; (4) melting and reassocia- 
tion of the HAP-purified partially double- 
stranded circles to a relatively low Cot (5-10); (5) 
purification of the remaining single-stranded 
circles (normalized library) over HAP; (6) conver- 
sion of the single-stranded circles to double- 
stranded circles; and (7) electroporation into bac- 
teria. 

It could be anticipated that during the reas- 
sociation reaction, because truncated cDNAs oc- 
cur at lower frequencies than their nontruncated 
counterparts, the extension products of the trun- 
cated cDNAs would more likely reanneal to the 
nontruncated overlapping cDNAs than to their 
own truncated templates. On the other hand, the . 
extension products of the nontruncated cDNAs 
would most likely reassociate to their own non- 
truncated templates not only because they are 
more prevalent but also because of the low prob- 
ability of there being an overlap between the 
short extension product of a nontruncated clone 
and a truncated single-stranded circle. As a result, 
nontruncated single-stranded circles are more 
likely to end up reassociated with more than one 
(nonoverlapping) extension product, whereas 
their truncated counterparts would remain 
single-stranded and therefore end up in the HAP 
flow-through fraction (normalized library). 

Reasoning that this problem could be cir- 
cumvented if the hybridizing fragments (1) were 
in excess over single-stranded circles, and (2) 
spanned the entire length of the cDNAs to maxi- 
mize the opportunity of overlap between trun- 
cated and nontruncated clones, we devised an 
approach (methods 2-1 and 2-2; note that 2-2 is 
the same as 2-1 except that hybridization condi- 
tions were different) whereby in vitro synthe- 
sized RNA from a plasmid DNA preparation of a 
starting library is used as driver in hybridization 
(Cot - 5) with the same library in the form of 
single-stranded circles. Indeed, these modifica- 



tions improved successfully the representation of 
the longest cDNAs in the normalized libraries 
(e.g., serum albumin in the liver/spleen libraries). 

However, in every library constructed with 
methods 2.l'and 2-2, we were able to identify 
cDNA clones that seemed to become normalized 
with much greater difficulty than others (e.g., 
a-globin in the 5Nb2HFLS20VV liver/spleen li- 
brary, and G3PD in the breast library). We inter- 
preted these results as suggestive tiiat not all 
clones might be transcribed in vitro with the 
same efficiency if in a mixture (i.e., in vitro tran- 
scription of plasmid DNA fron^ an entire library), 
and/or secondary structures in the RNAs (or in- 
teractions between RN.As) might impair their 
ability to hybridize with the single-stranded 
circles. These hypotheses were corroborated by 
the observation (not shown) that relatively weak 
hybridization signals were observed when North- 
ern blots of RNA transcribed in vitro from an en- 
tire plasmid library were hybridized with cDNA 
probes derived from those clones that could not 
be normalized as effectively, despite the fact that 
they occurred at high frequencies in the starting 
libraries from which the in vitro transcribed 
RN.As were synthesized. We did exclude the pos- 
sibility that the clones that were not being nor- 
malized effectively carried deletions that pre- 
vented them from being transcribed appropri- 
ately in vitro (not shown). In fact, all clones that 
were tested individually for in vitro transcription 
yielded the expected amounts of full-length RNA. 
Although this problem was significantly mini- 
mized in method 2-3 (cf. lanes, 1,2 in Fig. lA- 
D,F,G) the extent of normalization that was 
achieved was still not comparable to that ob- 
tained with method 1 (cf. lanes 2,4 in Fig. lA- 
D,F,H). 

The advantage of method 2-3 over methods 
2-1 and 2-2 is that the RNA driver is derived from 
a mini-library (of relatively low complexity) en- 
riched for abundant cDNAs rather than from the 
entire starting library. For this reason, higher Cot 
hybridizations can be carried out to eliminate or 
reduce significantly the representation of the 
most abundant cDNAs. It should be noted, how- 
ever, that method 2-3 is not a true normalization 
procedure, because the aim of this approach is 
not to equalize the frequency of all cDNA clones 
but rather to reduce significantly (or even to 
eliminate, depending on the Cot used) the repre- 
sentation of the most abundant clones. 

The extent to which the enrichment for 
abundant transcripts can be achieved m such 

GENOME RESEARCH ^799 



BONALDO ET AL. 

mini-libraries depends essentially on the Cot used 
for reassociation. Calculations based on estimates 
of frequencies of brain mRNAs (Scares et al. 1994) 
indicate that the best enrichments are obtained 
at a C„t = 5-10. If the C„t is too low («1) the 
enrichment is only for the most prevalent (class 
1) mRNAs; there is no enrichment for the mRNAs 
of the intermediate frequency class (class II) mR- 
NAs. On the other hand, if the Q.t is too high 
(250) the enrichment for class 1 transcripts starts 
to become less significant because of a higher rep- 
resentation of mRNAs of the complex class (class 
111) Prevalent and intermediate (classes I + II) 
brain mRNAs comprise 93-95% of the total 
cD\'A population in a Q.t = 5-10 HAP-bound 
mini-library, in contrast to 62% in the starting 
library. Consequently, the frequency of class III 
transcripts in a Cot = 5-10 HAP-bound mini- 
library is about 5.5-fold lower than that of the 
starting library (5-7% in the bound mini-library 
vs. 38% in the starting library). 

Methods 3 and 4 were developed as a result 
of an attempt to achieve both the adequate ex- 
tent of normalization obtained with method 1 
and the improved representation of the longest 
cDNAs accomplished with methods 2-1, 2-2, and 
2-3. Although more technically cumbersome, 
method 3 is superior to method 4 in that the 
DNA driver used in the hybridization is single- 
stranded. 

Single-stranded driver in method 3 (see Hg. 
3) is generated by Exonuclease III digestion of 
gel-purified double-stranded cDNA inserts ex- 
cised from the starting library. The resulting non- 
complementary single-stranded fragments repre- 
sent the 5' and 3' halves of the original cDNA 
inserts. The fragments that correspond to the 5' 
halves of the cDNAs are complementary to 
single-stranded circles prepared in vitro, whereas 
the single-stranded fragments that correspond to 
the 3' halves of the cDNA inserts are complemen- 
tary to single-stranded plasmlds prepared in vivo. 
Note that for the multiple sclerosis plaques li- 
brary constructed with method 3 we used single- 
stranded circles prepared in vitro. 

Production of single-stranded circles in vitro 
by the combined action of Gene 11 and Exonucle- 
ase 111 (Life Technologies), rather than in vivo by 
superinfection of a culture with a helper phage, is 
very beneficial because it circumvents the distor- 
tions that otherwise may arise as a result of the 
differential growth properties of clones with dif- 
ferent size inserts. However, because the diges- 
tion with Gene 11 results in the conversion of 



most, but not all, supercoiled plasmids to relaxed 
circles, it becomes necessary to purify the single- 
stranded circles that are produced after digestion 
with Exonuclease III by HAP chromatography. 

For construction of the normalized multiple 
sclerosis plaques library, the cDNA inserts were 
excised by double digestion of plasmid DNA from 
the starting library with Kotl and EcoRl. The fact 
that one in every three clones might have an in- 
ternal £coRl site (an Eco Rl site is expected to oc- 
cur once every 4096 bp, and the average insert 
size in these libraries is of the order of 1.4 kb) 
should not compromise the efficiency of the pro- 
cedure, because at least one of the resulting re- 
striction fragments would be expected to be 
>200 bp (clones smaller than 400 bp are size- 
selected out of these libraries) and therefore be 
able to form hybrids that would bind quantita- 
tively to HAP under our conditions. A disadvan- 
tage of method 3, as presented, is that only 
clones <2.9 kb (approximate vector size) can be 
excised cleanly from the vector. It is conceivable, 
however, that one might be able to use double- 
stranded cDNA fragments generated by PCR am- 
plification with T3 and 17 primers as substrate 
for the Exonuclease HI digestion in method 3. 

Method 4 was used to generate a significant 
fraction of the libraries that were contributed to 
the IMAGE Consortium (see Table 1). It is un- 
doubtedly the simplest and overall most advan- 
tageous of all procedures. Because the DNA driver 
is generated by PCR amplification of the starting 
(double-stranded or single-stranded, see below) 
plasmid library with T3 and T7 primers, the tracer 
(single-stranded circles) used in this hybridiza- 
tion may be produced in vitro or in vivo. 

The extent of normalization achieved with 
method 4 was comparable to that obtained with 
method 1 with the advantage that it successful- 
ly preserved the representation of the longest 
cDNAs (cf. lanes 4,5 in Fig. 1). Moreover, method 
4 is superior to method 1 because it does not 
preclude the clones derived from mRNAs with 
internal Notl sites from being represented in the 
normalized library. Because the starting material 
for the reassociation kinetics reaction in method 
1 is generated by a controlled primer extension 
reaction with an oligo(dT),8 primer, clones with- 
out an oligo(dA) ,s tail (derived from mRNAs with 
an internal Notl site) are not represented in the 
final normalized library, although they are not 
necessarily lost (clones without tails end up m 
the HAP flow-through fraction during HAl puri- 
fication of the partially double-stranded circles 



800^3 GENOME RESEARCH 



cDNA'BASED APPROACHES TO FACILITATE GENE DISCOVERY 



generated by this primer extension reaction). It 
should also be noted that this problem of method 
1 could be circumvented by the use of an oligo- 
nucleotide complementary to flanking vector se- 
quences [as opposed to the oligo(dT),„l for this 
controlled primer extension reaction. 

The potential biases introduced by PGR am- 
plification in method 4 are minimized by the fact 
that (1) PGR amplification products are used in 
excess in these hybridizations, and (2) the size dis- 
tribution of inserts in these libraries is relatively 
narrow (ranging typically from 0.4 to 2.5 kb). 

The conditions used for hybridization greatly 
influenced the quality of the resulting normal- 
ized libraries constructed with method 4. This is 
to a great extent a consequence of the fact that 
we are using HAP to purif>' single-stranded circles, 
as opposed to a biotin-avidin capture system, 
which in our hands yielded significantly less satis- 
factory results (M.F. Bonaldo and M.B. Soares, un- 
publ.). The best results were obtained when the hy- 
bridization conditions were the most similar to the 
HAP conditions. We interpreted these results as 
suggestive of the fact that imperfect hybrids 
formed during hybridization may either not bind 
to H.AP and/or may melt once in the HAP buffer. 

It is noteworthy that a much superior extent 
of normalization was obtained with method 4 
when single-stranded plasmid DNA prepared in 
vitro, as opposed to double-stranded plasmid 
DNa! was used as template for PGR amplification 
(not shown). These results suggest that a fraction 
of the double-stranded plasmids used as template 
for PGR ampliflcation, presumably in the form of 
melted supercoiled DNA, might end up in the 
HAP flow-through fraction (normalized library) 
during purification. 

It is noteworthy that cross-hybridizing di- 
verged sequences seem to escape normalization 
in all of the procedures discussed above. For ex- 
ample, the frequency of Alu repeat-containing 
cDNAs (typically 10% in directionally cloned 
cDNA libraries) is practically the same in starting 
and normalized libraries. These results suggest 
that imperfect hybrids either do not bind to 
HAP under our conditions or melt once diluted 
in the (more stringent) HAP buffer. This is advan- 
tageous, not only because it preserves the repre- 
sentation of Alu-containing cDNAs that might 
correspond to otherwise rare mRNAs, but also, 
and most signiflcant, because it minimizes the 
likelihood that a rare member of a gene family 
might be excluded from the final (normalized 
or subtracted) library as a result of a cross-hy- 



bridization with a more prevalent but diverged 
sequence. 

The use of normalized libraries for large-scale 
gene discovery/EST programs is beneficial be- 
cause it minimizes redundancies while increasing 
the representation of the rarer cDNAs by about 
threefold, on average. However, given the great 
extent of overlap in gene expression among dif- 
ferent tissues, the use of normalized libraries 
alone is not sufflcient to maintain a desirable 
pace of identification of novel sequences at ad- 
vanced stages of such programs. For this reason, 
we propose that the use of subtracted libraries 
enriched for clones not yet idenrifled might be- 
come increasingly advantageous. Toward this 
goal, we have developed a subtractive hybridiza- 
tion' approach designed specifically for this pur- 
pose (see Fig. 4). In a pilot experiment, we were 
able to reduce significantly the representation of 
-5000 INFLS-IMAGE Consortium clones from 
the INFLS library itself (see Fig. 5). With the de- 
velopment of appropriate clustering algorithms, 
the use of nonredundant sets of cDN.Vgene se- 
quences as drivers for hybridizations to generate 
subtractive libraries enriched for novel sequences 
should soon become possible, and hopefully will 
facilitate the isolation of all human and mouse 
cDNAs srill awaiting identification. 



METHODS 

Construction of Directionally Cloned cDNA 
Libraries 

Poly(A)- RNA was purified from total cellular RNA (e.xcept 
for senescent fibroblasts from which cytoplasmic RNA was 
isolated) using the Oligote.x mRN.^ kit (Qiagen) accordiiig 
to the manufacturer's instructions, except that two rounds 
of purification were performed. cDNA library constriact.on 
was essentially as described before (Adams et al. 1993b 
Soares 1994). Typically, 1 poly(A)- was annealed^at 
37'C with a twofold mass e.xcess of a Norl-tag-(dl),8 
primer (or P<,d-tag-(dT).8 in the case of the Ij^^er/^Pleen 
library! and reverse transcribed at 37'C w.th Superscript 
Sse Transcriptase (Life Technologies,. Alterr^atively 
poly(A)- RNA was annealed at 45'C with a fourfold mass 
?xce s of a Notl-tag-(dT),s primer and reverse "anscribed 
at 45'C. The tog is a sequence of 2-6 ""^leotide tha 
unique for each library and thus serves ^^Z'^'''^; "^'^^^^^ 
TaWe l). With the exception of infant bram. f^';' »'^"/ 
spleen and term placenta, all other first-strand cDNA 
syntheses were primed with '^e foll<^ving ohgo^^^^^^^^ 
tide: TGlTACCAATCTCAAGTGGG.^GCGGCCGC-tag 
dT) The oligonucleotide AACTGGAAGAATrCGC- 

S'cCGCAGGAA(It). (Pharmacia) was ^°^P';- 

both infant brain and term P'='<^!"" ^"I-^'^^t^'^^ 
theses. The oligonucleotide AACTGGAAGAATTAATTA^ 
GATCTtdT),, was used to prime the synthesis of first- 



GENOME RESEARCH «5 801 



I 



CL 

o 
U 

CO 

z> 

>s 

•o 

o 
o. 

E 

T3 



o. 



CO 

>* 

c 

■Q 



o 

CO 



CO 

c 
o 



8 i; 

1? 



•o i 

•a ^ 

O i 

o 5 

(0 

ci 

CO 
V) 



o 
E 



BONALDO ET AL. 

strand fetal livcr/spIccn cDNA. Doublc-strandcd cDNAs 
were sizc-sclectcd by gel t'iUration over a long (64 cm) and 
narrow (0.2-cm diameter) Bio-Gel A-50m (Bio-Rad, 100- 
200 mesh) column, and ligatcd to a 500- to lOOO-foId mo- 
lar excess of adapters. Infant brain cDNAs were ligated to 
/////dlll adapters, digested with Kotl, size selected over a 
second Bio-Gel column, and cloned directionally into the 
Sot\ and Hi/MIII sites of the Lafmid BA vector (Soares et al. 
1994). Fetal livcr/splocn cDNAs were ligatcd to fcoRI 
adapters (Pharmacia), size-selected as above, digested with 
Pitcl and cloned directionally into the Pad and EcoKl sites 
of the pT7T3-PiK- vector. All other cDNAs were ligated to 
EcoKl adapters (Pharmacia), size-selected as above, digested 
with Kot\ and cloned directionally into the Not\ and £coRI 
sites of the pT7T3-riif vector. pT7'\'3-Piic is essentially the 
same as pT7T318D (Pharmacia) with a modified 
polvlinker. Figure 6 shows the sequence of the pjm-Pitc 
pohiinker and flanking sequences. 



Production of Purified Covalently Closed 
Single-stranded Library DNA in Vitro 

Double-stranded phagemid DNA was converted to single- 
stranded circles by the combined action of Gene II (phage 
Fl endonuclease) and Escherichia coli Exonuclease III en- 
zymes, as per the manufacturer's instmctions (Life Tech- 
nologies; cat. no. 10356-020). The resulting single- 
stranded circular DNA was purified from the remaining 
double-stranded plasmids by HAP chromatography (Bio- 
Rad) as described previously (Soares et al. 1994). The rep- 
lication initiator protein of bacteriophage fl (Gene 11) is a 
site-specific endonuclease that binds to the fl origin in 
phagemid vectors and nicks the viral strand of the super- 
coiled DNA . The nicked strand is then digested from its 3' 
end with Exonuclease ill (Hoheisel 1993) to generate 
single-stranded circles. Purification of the resulting single- 
stranded circles over HAP is necessary because the conver- 
sion of supercoiled to relaxed plasmids by Gene H is never 
complete. The Gene II reaction was performed for 1 hr at 
30'C and contained typically 4 p-g supercoiled plasmid li- 
brary DNA, 1 jjlI Gene II (Life Technologies), and 2 \i\ 
10 X Gene II buffer (Life Technologies) in a total volume of 
20 M-l. The Gene II protein was heat inactivated for 5 min 
at 65*C; the reaction mixture was chilled on ice; 2 \i\ Exo- 
nuclease III (Life Technologies, Cat. No. 18013-011, 65 
units/p.1) was added; and the reaction was incubated for 30 
min at 37'C. Gene II and Exonuclease III were then di- 
gested with Proteinase K (Boehringer Mannheim) for 15 
min at 50°C in a 100-p.l reaction containing 10 mM Tris 



(pH 7.8), 5 niM ethylenediamine tetraacetic acid (KDTA), 
0.5'Ki SDS, and 136 ^lg Proteinase K. After extraction with 
equal volume of phenol-chloroform-isoamyl alcohol (25: 
24:1), library DNA was ethanol-precipitated and digested 
with PvuU for 2 hr at 37'C. This was done to convert the 
remaining supercoiled plasmids into linear DNA mol- 
ecules and thereby improve their bindability to HAP under 
our conditions. Note that PviiW does not cleave single- 
stranded circles and that there are two PvuU sites in the 
vector. The reaction was diluted with 2 ml loading buffer 
[0 12 M sodium phosphate buffer {pH6.8). 10 mst EDTA, 
and 1% SDSl and purified by HAP chromatography at 
60**C, using a column pre-equilibrated with the same 
buffer (1-ml bed vol.; 0.4 g of HAP). After a 6-ml wash with 
loading buffer, this volume was combined with the flow- 
through fraction, and the sample was extracted twice with 
water-saturated 2-butanol, once with dry 2.buianol, and 
once with water-saturated ether (3 vols, per extraction). 
Residual ether was blown off by vacuum and the sample 
was desalted by passage through a Nensorb column (Du- 
Pont/NEN) according to the manufacturer's specifications, 
concentrated down to -0.35 ml and ethanol-precipitated. 
Note that Gene Il-Exonuclease III prepared single- 
stranded DNA is in the opposite polarity to single-stranded 
DNA generated by in vivo phagemid production. 



Production of Purified Covalently Closed 
Single-stranded Library DNA in Vivo 

Plasmid DNA from the starting librar>' was electroporated 
into £. coli DH5aF' bacteria, and the culture was grown 
under ampicillin selection at 37'C to an OD^ of 0.2, su- 
perinfected with a 10- to 20-fold e.xcess of the helper phage 
M13K07 (Pharmacia), and harvested after 4 hr for prepa- 
ration of single-stranded plasmids, as described (Vieira and 
Messing 1987). 



Conversion of Single-stranded Circles to 
Double-stranded Plasmids 

Single-stranded circles (<50 ng) were ethanol-precipitated 
and resuspended in 11 ^1 water. Then 4 nl 5x Sequenase 
buffer (USB) and 1 jil primer (1 jig) were added and the 
mixture was incubated at 65^C for 5 min and then at 37 C 
for 3 min. Then 1 m.1 Sequenase version 2.0 (USB), 1 jxl 0.1 
M dithiothreitol (DTP), and 2 ^.l mixed dNTP stock (a so- 
lution containing each deo.xynucleotide at a final concen- 
tration of 10 m.M) were added, and the reaction was incu- 
bated at 37'C for 30 min. The total 



S'-caccccaggctttacactttatgcttccggctcgtatgttgtgtggaattgtgagcf^f^ataacaatttcacacasfiaaacagctatg 

M13 Reverse Sequencmg Primer 
.r.r»ttacgaatt taatacgactcactataRSRa atttGGCCCTCGAGGCOUVG_MTrCCCGAC3^^ 
T7 Promoter S/il Econ SnaBl 

GTCGGGGATCCGTCI1ME[^AGCCGCC^^ 

BamHi Pad NotI Hmdlll T3 Promoter 

tggrrgtrgttttacaac f^cgtgactggRaaaac cctggcgttacccaacttaatcgccttgcag-3'. 

M13 Sequencing Primer 

Figure 6 Sequence of the pT7T3-Poc polylinker (uppercase) and flanking 

sequences (lowercase). 



volume was taken up to 100 ^.1 with 
10 mM Tris (pHS.O) and 1 mst EDTA 
(TE) and the reaction was e.xtracted 
once with phenol-chloroform- 
isoamyl alcohol (25:24:1). Plasmid 
DNA was ethanol-precipitated and 
dissolved in 3 \i\ TE. The following 
oligonucleotides were used for this 
primer extension reaction: (I) M13 
Reverse Sequencing Primer (S'- 
AGCGGATAACAATTTCACA- 
CAGGA-3'), which is complementary 
to single-stranded prepared In vitro, 



802 GENOME RESEARCH 



CDNA^BASED APPROACHES TO FACILITATE GENE DISCOVERY 



and (2) Oligo-Amp (5'-GACTGGTGAGTACTCAAC- 
CAAGTC-3'), which is complementary to tlie aiiipicillin re- 
sistance gene of single-stranded pT7T3-Pac or Lafmid BA 
plasmids prepared in vivo. 



In Vitro Synthesis of Library RNA 

Some 2-5 p-g of double-stranded plasmid DNA from either 
the starting library (see methods 2-1 and 2-2 below) or the 
mini-library of abundant cDNAs (see method 2-3 below) 
was linearized with either Pad (NEB) or Sot\ (NEB) and 
used as a template for synthesis of RNA with RiboMax 
Large Scale RNA Production Systems T7 or T3 (Promega), 
according to the manufacturer's instructions. After treat- 
ment with ribonuclease-free DNAse I (Promega), to digest 
away the plasmid DNA template, the RNA was used for 
hybridization as described below. It should be noted that 
RNA synthesized with T7 RNA Polymerase is in the mes- 
sage-like orientation and is complementary to the single- 
stranded circles produced in vitro.. On the other hand, 
RNA synthesized with T3 RNA Polymerase is in the an- 
timessage orientation and it is complementary to single- 
stranded circles produced in vivo. 



Normalization Method 1 

The procedure used for construction of the normalized 
human infant brain (INIB) library (here designated as 
method 1) has been described previously (Soares et al. 
1994). Method 1, with minor modifications, was also ap- 
plied to construct the normalized human fetal liver/spleen 
cDNA library (INFLS). To synthesize a partial second 
strand of about 200 nt by limited extension, a 100 \l\ re- 
action mixture containing 5 m.1 0.5 \Lgl^\ Pv»ll-digested, 
HAP- and gel-purified single-stranded plasmid DNA from 
the fetal liver/spleen starting library, 7 \l\ 10 ng/p-l oUgo 
(dT)»2.,8 (Pharmacia), 10 ^tl lOx Primer Extension Buffer 
(0.3 M Tris (pH 7.5), 0.5 M NaCl, and 0.15 m MgClJ, 10 jil 
O.l M DTT, 10 mUed dNTP stock, 25 p.1 m'lxed ddNTP 
stock (a solution containing each dideoxy A, C, and G at a 
final concentration of 25 mM), 5 \ii 800 Ci/mmole 
(a-^^P]dCTP, and 20.5 jtl water was incubated at 60*'C for 5 
min, at SO'C for 15 min, and at 37"C for 2 min. Then 7.5 
jjlI 5 units/jil Klenow enzyme (USB) was added, and the 
reaction was incubated at 37°C for 30 min. The reaction 
was extracted with phenol-chloroform-isoamyl alcohol 
(25:24:1), 5 pig melted and sheared salmon sperm DNA was 
added, and the partially double-stranded plasmids were 
purified from the remaining single-stranded circles (un- 
primed molecules, as well as clones derived from mRNAs 
with an internal Pad site that therefore do not contain an 
oligo(dA) tail at the 3' end) by HAP chromatography. The 
HAP-bound fraction containing the partially double- 
stranded plasmids was eluted with 6 ml 0.4 M sodium 
phosphate buffer (pH 6.8). 10 m m EDTA, and 1% SDS, and 
plasmid DNA was desalted as described before (Soares et al. 
1994) and ethanol-precipitated. The DNA (173 ng) -was 
resuspended in 2.5 jil deionized formamide and melted at 
80*C for 3 min under 10 ^.1 mineral oil. Then I jil of 5 
jig/^l oligo(dT),2,,a (used to block the tails) was added, 
and the mixture was heated at 80'C for 1 min. Then 0.5 \i\ 
5 M NaCI, 0.5 \i\ 10 x TE, and 0.5 jil water were added, and 
the reassociation reaction was Incubated at 42''C for 0.6 hr 



(calculated C„t = 0.5). The remaining single-stranded 
circles were purified over HAP (flow-througli fraction) and 
subjected subsequently to a second cycle of the normal- 
ization procedure as described above, except that reasso- 
ciation was conducted for 24 hr (calculated Q.t = 20). The 
remaining single-stranded circles (normalized library; 
INFLS) were purified over HAP, converted to double- 
stranded plasmids, eloctropuratcd into 1>1110U bacteria, 
and propagated under ainpicillin selection. 

Normalization Methods 2-1, 2-2, and 2-3 

Method 2 is a reassociation kinetics-based approach in- 
volving hybridization of in vitro synthesized RNA (the 
driver) derived either from the entire library (methods 2-1 
ai\d 2-2; see Fig. 2) or from a mini-library enriched for 
abundant cDNAs (method 2-:^; see Fig. 2). with the whole 
starting library in the form of single-stranded circles (the 
tracer). The remaining single-stranded circles (normalized 
librar>') are purified by H\V chromatography (HAP flow- 
through fraction), converted to double-stranded plasmids 
for improvement of electroporation efficiency, electropor- 
ated into DHIOB bacteria (Life Technologies), and propa- 
gated under ampicillin selection. A number of normalized 
cDNA libraries were constructed with these n\ethods using 
single-stranded plasmids prepared both in vivo and in 
vitro (see Table 1). In all three variants, the driver was first 
pre-annealed with a pair of oligonucleotides to block both 
5' and 3' vector sequences as follows: 0.5 jil ( 10 p.g) of each 
oligonucleotide, 1 \x.\ RNA (5.0 ^ig in methods 2-1 and 2-3; 
0.5 p.g in method 2-2;, and 4.0 ^ll deionized formamide 
were heated for 3 min at SO'C under 10 ^ll mineral oil and 
quicklv chilled on ice. Then 0.8 ^1 10 x hybridization 
buffer [0.4 M Pipes (pH 6.4), 4 m NaCl, and 10 mst EDTA in 
methods 2-1 and 2-3; 04 m Pipes (pH 6.4), 1.2 m NaCl, 10 
mM EDTA, and 1% SDS in method 2-2), 0.5 jil RNAsin 
(Boehringer Mannnheim), and 0,7 \x.\ water were added 
and the mixture (total volume, 8 jil) was incubated over- 
night at 42'C (methods 2-1 and 2-3) or 30X (method 2-2). 
In another tube, 2.5 jil (50 ng) single-stranded library DNA 
in deionized formamide was heated for 3 min at SCC un- 
der mineral oil; 0.5 m-HO x hybridization buffer and 2.0 ^ll 
water were added; and the mixture was transferred to the 
tube containing the preannealed RNA. Hybridization (13- 
jil reaction) was performed at 42'C (method 2-1: Cot = 
5-10; method 2-3: Cot = 100-200) or at 30^C (method 2-2: 
Cot = 5-10). The driver, rather than the tracer, was blocked 
because otherwise the latter would, to some extent, bind to 
HAP during purification. The plasmid mini-library en- 
riched for abundant cDNAs that served as a template for 
the synthesis of RNA used as driver in method 
2-3 was prepared from the HAP-bound fraction obtained 
during purification of the normalized library in method 
2-1 Different pairs of blocking oligonucleotides were used, 
depending on whether the RNA was synthesized with 
T3 or T7 RNA polymerases. To block RNA synthesized with 
T3 RNA polymerase, which was used in hybridizations 
with single-stranded plasmids prepared in vivo w-e used: 
5'.,oAGGGCGGCCGCAAGCTTATTCCCTTTAGT. 

GAGGGTTAAT-3' (this oligonucleotide was used to b ock 
5' vector sequences of all but the human fetal liver/spleen 
library RNA), and S M^AGATCmAATTAAGCGGCC^^^ 
CAAGCTTATTCCCnTAGTGAGGGTrAAT-3 (this oUgo- 
nucleotide was used to block 5' vector sequences ot tne 



GENOME RESEARCH 5? 805 



5 



O 

o 

CO 



TO 



o 
o. 

o 

£ 

T3 

ci. 



c 



o 



CO 

c 
o 



r3 



c 
o 



E 
o 

■o 

*CL 

o 
o 
cn 

TO 1 

5 K 
<u 5 

TO I 
Q. \ 



CO 

E 



BONALDO ET AL. 

human fetal liver/splccn lihrary RNA). ami S'-AGG- 
CCAAGAAITCGC.CACCiAC;-;^ (this o!igt>iUKk*oliUc was 
used to block 3' vector sequences). 'I'o block UNA synthe- 
sized witli T7 RNA polymerase, which was used in hybrid- 
izations with single-stranded plasmids prepared in vitro 
we used: S'-CCTCCiTGCCG AAriCTTCHiC'.CTCG AG- 
C;GCCAAA*iTC:CC-;v {this oUv;<>nucIeotide was used to 
block 5' vector sequences). I lie oligoiuiclet)tit.le used to 
prime the synthesis of first-strand cDNA was also used to 
l)lock 3* vector sequences. 



Normalization Method 3 

Metliod 3. used to generate tiie normalized lihrary from 
multiple sclerosis plaques (2NbHMSI»j, is a rcassociation- 
kinetics-based approacli involving hybridization (Ct = 
20-25) of a 20-fold excess of Kxonuclease Ill-digested. 
cDNA inserts excised from a plasm id DNA prei>aration of 
the starting library with the library itself iti the form of 
single-stranded circles, followed by HAl*-purification of 
the remaining single-stranded plasmids, conversion to 
double-strands, and electroporation into bacteria. Some 5 
p.g double-stranded plasmid DNA from the starting library 
was doubly digested with Not\ and EcoK\: the excised 
cDNA inserts were separated from the cloning vector by 
agarose gel electrophoresis; and the DNA was purified us- 
ing beta-agarase (NEB) according to the manufacturer's in- < 
structions. Then 0.6 ^g gel-purified double-stranded cDNA 
inserts in 47.5 ^1 TE was digested with E.xonuclease III at 
37'C for 30 min in a 60-p.I reaction containing 6 
10 X E.xonuclease III buffer [0.5 m Tris (pH 8.0) and 50 msi 
MgCK)L 0.6 fil 0,1 M DTV, 2.9 water, and 3 pil of 65 
units/Vl Exonuclease 111 (Life Technologies). The Exo- 
nuclease was then digested with 136 ^lg Proteinase K (Boe- 
hringer Mannheim) atSO'C for 15 min in a 100-p.l reaction 
containing 10 m.M Tris (pH 7.8j, 5 mM EDTA, and 0,5% 
SDS. After two extractions with phenol-chloroform- 
isoamyl alcohol (25:24:1), the resulting noncomplemen- 
tary single-stranded DNA (total amount -0.3 ixg) was etha- 
noi-precipitated and resuspended in 1 \x.\ TE. A 5-^1 hy- 
bridization reaction was then set up as follows: 1 jil 
Exonuclease Ill-digested cDNA inserts (an estimated 
amount of 150 ng of single-stranded DNA) and 50 ng 
single-stranded plasmid DNA from the starting multiple 
sclerosis plaques library (prepared in vitro) in 2.5 \l[ deion- 
ized formamide were mixed and heated at SO'^C for 3 min 
under 10 p.1 mineral oil. Then 0.5 jil (10 ^ig) of a blocking 
oligonucleotide (S'-CCTCGTCCCGAATTCITGGCCTC- 
GAGGGCCAAATTCCCTATAGTGAGTCGTATTA-3'), 0.5 
jjlI 5 M NaCI, and 0.5 m-I 10 x TE were added, and the mi.x- 
ture was incubated at 42"C for 41 hr (calculated C^t of 23). 
The remaining single-stranded plasmids were purified by 
HAP chromatography, converted to double-stranded plas- 
mids, and electroporated into DHIOB bacteria (Life Tech- 
nologies) as described above. 



Normalization Method 4 

This is a reassociation-kinetics-based approach involving 
hybridization of a 20-fold excess of cDNA inserts generated 
by PGR with the library itself in the form of single-stranded 
circles, followed by HAP purification of the remaining 
single-stranded plasmids, conversion to double-strands, 



electroporation into l)HI0l> bacteria, and amplification 
under ampicillin selection. PCU amplification of cDNA in- 
serts was performed using the Expand High I'idelity PGR 
System (Boehringer Mannheim) according to the manu- 
facturer's instructions. This PGR system is composed of an 
enzyme n»ixturc containing thermostable Taq DNA and 
Pwo DNA polymerases (Barnes 1994). An amount of I jjlI 
{2.5-5.0 ng) DNA template |double-siranded plasmids (fe- 
tal lung, parathyroid adenoma, senescent fibroblasts) or 
single-stranded circles prepared in vitro (fetal heart, 
14Nb2HI-LS20VV.fetal liver/spleen, and all mouse, rat, and 
schistosome libraries listed in Table 1)1 was mixed with 2 
p.! dNTP stock (the final concentration of each dNTP in the 
reaction is 200 jisi), 5 m.1 of a 2()-^m solution of T7 Primer 
t5'-T.A.ATAGG.ACTG.AGT.AT.AGGG.3'). 5 ^xl of a 2()-^JLM so- 
lution of T3 Primer (5'-A ITA.AGCCTGACrAAAGGG.A-S'), 
U) ^1 10 X 1 Expand High iMdeliiy buffer, 0.75 ^\ Expand 
High i-ideliiy enzyme mix (2.6 units), and 76.25 jil water. 
Then 50 mineral oil was added and the reaction mixture 
was subjected to the followii\g amplification cycle condi- 
tions in a Perkin Elmer Thermocycler: 7 min while ramp- 
ing up from room temperature to 9-t^G; 20 cycles of I min 
at 94'C, 2 min at 55°G, and 3 min at 72'G, and 7 min at 
72''G. PCR-amplified fragments were purified using the 
High Pure PGR Product Purification Kit (Boehringer Man- 
nheim) as instructed by the manufacturer. The purified 
PGR product was ethanol-precipitated and dissolved in 5 
^JLl TE. Then 1.5 (0.5 p.g) PGR products was mixed with 
5 jjlI (50 ng) library DNA (single-stranded circles prepared 
in vitro) in deionized formamide. 0.5 jil (10 ^ig) 5' block- 
ing oligo AV-1 (5'-GGTGGTGGGGA.ATTGTTGGGGTC- 
GAGGGGGAAA'rTCGGT.ATAGTGAGTGGTATTA-3'), 0.5 
^ll (10 tig) 3' blocking oligo AR (5'-ATTAAGGGTGAG. 
TAA.AGGGAATAAGGTTGGGGGGGGTzo-^'; used for all 
but the fetal liver/spleen library), or alternatively, (0.5 \i\ 
(10 ^g) 3' blocking oligo AV.2 (5''^ATTAAGGGTGAC- 
TAAAGGGAATAAGGTTGGGGCGGGTTAATTAAA- 
GATGT,9-3'; used only for the fetal liver/spleen library), 
and this mixture was heated at 80°G for 3 min under 10 p.1 
of mineral oil. Then 1 ^l lOx buffer-A (1.2 m NaGl. 0.1 M 
Tris (pH 8.0), and SO ms( EDTA; used for fetal lung, fetal 
heart, parathyroid adenoma, senescent fibroblasts, and 
l9.5-days postconception (dpc) mouse embryo] or, alter- 
natively, 1 ^tl lOx buffer-B (1.2 m NaGl, 0.1 m Tris (pH 8.0), 
SO mM EDTA, and 10% SDS; used for l4NT32HFLS20\V.fetal 
liver/spleen, 17.5-dpc mouse embryo, 13.5- to 14.5-dpc 
mouse embryo, rat heart, rat kidney, and 8-week schisto- 
some], and 1.5 \il water were added, and the hybridization 
was performed at 30^C for 24 hr (calculated Got - 5). The 
remaining single-stranded circles were purified by HAP 
chromatography, converted to double-strands, and elec- 
troporated into DHIOB (Life Technologies) bacteria, as de- 
scribed above. 



Subtractive Hybridization 

Double-stranded plasmid DNA from a pool of 4992 clones 
grown individually in 384 well plates (INLAGE Gonsortium 
plates LLAM 78-90, identification nos. 66696-67079 and 
108168-112775) derived from the normalized fetal liver/ 
spleen library (INFLS) was prepared using the Qiagen 
Midi-prep kit according to the manufacturer's instruc- 
tions and converted to single-stranded circles in vitro, as 
described above. Single-stranded circles were purified by 



804 ^GENOME RESEARCH 



cDNA^BASED APPROACHES TO FACILITATE GENE DISCOVERY 



HAI* chromatography and used as a template for PGR am- 
pliticatioci witli 17 at\d T3 priiners» as described above. An 
amount of 1.5 jig of PCR-amplified cDNA inserts from the 
I.I.A\I 7S-90 pool (in 4 \i\ deionized formamide) was 
mixed with 50 ng of single-stranded circles from the 
I NILS library (in 2 jil deionized formamide), 2. ! ^JLl (42 jig) 
5* bIockit\g oligo AV- 1 , and 2. 1 (42 jig) 3' blocking oligo 
AV.2. rtien 10 ^1 mineral oil was added, and the mixture 
was heated at 8(rC for 3 min. Then 1.2 n! 10 x buffer-B 
and 0.6 M-l ^^ater were added, and the hybridization was 
performed at 3(rc for 43 hr (calculated Cot = 27). The re- 
maining single-stranded circles were purified over HAP, 
converted to double-strands, electroporated into DHIOB 
bacteria, and propagated under ampicillin selection to 
generate the subtracted liver/spleen library (INFLS-Sl). 
H AP-boutAd ONA was also processed and purified for use in 
control experiments. 



ACKNOWLEDGMENTS 

We are most grateful to Dr. Joel A. Jessee (Life Technolo- 
gies) for helpful discussions and for having supplied us 
with Gene II. We are also thankful to Dr. LaDeana HilUer 
and Dr. Marco Marra (Genome Sequencing Center at 
Washington University in St. Louis) for having diligently 
provided us with feedback information on several features 
of our libraries, based on the voluminous sequence data 
that they obtained, which greatly facilitated our assess- 
ment of the efficacy of the various methods that we de- 
veloped. We are also in debt to Dr. Stephen Brown (Co- 
lumbia University), Dr. Conrad Gilliam (Columbia Univer- 
sity), Dr. Anne Bowcock and Ms. Monique Spillman 
(University of Texas Southwestern Medical Center at Dal- 
las). Dr. Donald Gilden (University of Colorado Health 
Sciences Center), Dr. Val Sheffield (University of Iowa), Dr. 
Roderick Mclnnes (University of Toronto and Hospital for 
Sick Children, Canada), Dr. David IGein (National Institute 
of Child Health and Human Development, National Insti- 
tutes of Health (NIH)], Dr. Anthony Albino and Dr. Alice 
de Oliveira (Memorial Sloan-Kettering Cancer Center), Dr. 
Stephen Marx (National Institute of Diabetes and Diges- 
tive and Kidney Diseases, NIH), Dr. Barbara Burkhart (Na- 
tional Institute of Environmental Health Sciences, NIH), 
Dr. Kevin Becker [National Institute of Neurological Dis- 
orders and Stroke (NINDS), NIH], Dr. Minoru Ko (Wayne 
State University), Dr. Ronald Blanton and Dr. Aravinda 
Chakravarti (Case Western Reserve University), and Dr. 
Mark Boguski (National Centre for Biotechnology Infor- 
mation, NIH) for having either faciliated our access to or 
provided tissue or total RNA from most sources used in 
construction of the libraries described in this manuscript. 
We are also most grateful to Mr. Long Su, Dr. Pierre Jelenc, 
Ms. Lee Lawton, Mrs. Ling Qiu, and Ms. Susan Baumes for 
most valuable assistance throughout this work. This work 
was supported by grants from the U.S. Department of En- 
erg\- (FG02-91ER61233) and the National Center for Hu- 
man Genome Research, NIH (ROl HG00980), to M.B.S. 
The work of C.L. was performed under the auspices of the 
U. S. Department of Energy by Lawrence Livermore Na- 
tional Laboratory (LLNL) under contract number W-7405- 
ENG-43.. 

The publication costs of this article were defrayed in 
part by payment of page charges. This article must there- 



fore be hereby marked "advertisement" in accordance 
with 18 use section 1734 solely to indicate this tact. 



REFERENCES 

Adams, M.D., J.M. Kelley.J.D. Gocayne. M. Duhnick, 
M.H. Polymeropoulos, H. Xiao. C.R. Merrit. A. Wu, U. 
Olde, R.F. Moreno, A.R. Kerlavage, W.R. McCombie. and 
J. Craig Venter. 1991. Complementary l).\*A sequencing: 
Expressed sequence tags and Human Genome Project. 
Science 252: 1651-1656. 

Adams. M.D. M. Dubnick, A.R. Kerlavage, R. Moreno, 
J.M. Kelley. T.R. Utterback, J.W. Nagle, C. Fields, and J. 
Craig Venter. 1992. Sequence identification of 2,375 
human brain genes. Witun- 355: 632-634. 

Adams, M.O., A.R. Kerlavage, C. Fields, and J.C. Venter. 
1993a. 3,400 new expressed sequence tags identify 
diversity of transcripts in humna brain. Nature Genet. 
4: 256-267. 

Adams, M.D., M.B. Soares, A.R. Kerlavage, C. Fields, and 
J.C. Venter. 1993b. Rapid cDNA sequencing (expressed 
sequence tags) from a directionally cloned human infant 
brain cDNA library. Satitre Cenet. 4: 373-380. 

Adams, M.D., A.R. Kerla%age, R.D. Fleischmann, R.A. 
Fuldner, C.J. Bult, N.H. Lee, E.F. Kirkness, K.G. 
Weinstock, J.D. Gocayne, O. White, et al. 1995. Initial 
assessment of human gene diversity and expression 
patterns based upon 83 million nucleotides of cDNA 
sequence. N^itnre 377: 3-174. 

Altschul, S.F., W. Gish, W. Miller, E. Myers, and D.J. 
Lipman. 1990. Basic local alignment search tool. /. Mol. 
Biol. 215:403-410. 

Barnes, W.M. 1994. PGR amplification of up to 35-kb 
DNA with high fidelity and high yield from lambda 
bacteriophage templates. Proc. Natl. Acad. Sci. 
91: 2216-2220. 

Berry, R., T.J. Stevens, N.A. Walter, A.S. Wilcox, T. 
Rubano, J.A. Hopkins, J. Weber, R. Goold, M.B. Soares, 
and J.M. Sikela. 1995. Gene-based sequence-tagged-sites 
(STSs) as the basis for a humna gene map. Nature Genet. 
10:415-423, 

Bishop, J.O.. J.G. Morton. M. Rosbash, and M. 
Richardson. 1974. Three abundance classes in HeLa cell 
messenger RNA. Nature 250: 199-204. 

Davidson, E.H, and R.J. Britten. 1979. Regulation of gene 
expression: Possible role of repetitive sequences. Science 
204: 1052-1059. 

Hillier, L, G. Lennon, M. Becker, M. Bonaldo, B. 
Chiapelli, S. Chissoe, N. Dietrich, T. DuBuque, A. 
Favello, W. Gish, et al. 1996. Generation and analysts of 
280,000 human expressed sequence tags. Genome Res. 
(this issue). 



GENOME RESEARCH ^805 



BONALDO ET AL 



Hoheisel, J.D. 1993. On the activities of Escherichia coli 
e.xonuclease III. Anal, Biochem. 209: 238-246. 

Houlgatte, R., R. Mariage-Samson, S. Duprat, A. Tessier, 
S. Bentolila, B. Lamy, and C. Auffray. 1995. The 
Genexpress Index: A resource for gene discovery and 
genie map of the human geriome. Genome Res. 
5: 272-304. 

Johnston, S., J.H. Lee, and D.S. Ray. 1985. High-level 
expression of M13 gene U protein from an inducible 
polycistronic messenger RNA. Gene 34: 137-145. 

Khan, A.S., A.S. Wilcox, M.H. Polymeropoulos, J.A. 
Hopkins, T.J. Stevens, M. Robinson, A.K. Orpana, and 
J.M. Sikela. 1992. Single pass sequencing and physical 
and genetic mapping of human brain cDNAs. Nature 
Genet. 2: 180-185. 

Lennon, G.G., C. Auffray, M. Polymeropoulos, and M.B. 
Scares. 1996. The l.M.A.G.E. Consortium: An integrated 
molecular analysis of genomes and their expression. 
Genomics 33: 151-152. 

Combie, W.R., M.D. Adams, J.M. Kelley, M.G. FitzGerald, 
T.R Utterback, M. Khan, M. Dubnick, A.R. Kerlavage, J.C. 
\ enter, and C. Fields. 1992. Caenorhabditis elegans 
expressed sequence tags identify gene families and 
potential disease gene homologues. Nature Genet. 
1: 124-131. 

Okubo, K., N. Hori. R. Matoba, T. Niiyama, A. 
Fukushima, Y. Kojima, and K. Matsubara. 1992. Large 
scale cDNA sequencing for analysis of quantitative and 
qualitative aspects of gene expression. Nature Genet. 
2: 173-179. 

Rasched, I. and E Oberer. 1986. Ff coliphages: Structural 
and functional relationships. MkrobioL Rev. 50: 401-427. 

Scares, M.B. 1994. Construction of directionaliy cloned 
cDNA libraries in phagemid vectors. In Autotmtiid DN.\ 
S'. quendn:^ and anlaysis (ed. M.D. Adams, C. Fields, andJ.C. 
C enter), pp. 110-114. Academic Press. New York. NY. 

Sonrcs, M.B.. M.F. Bonaklo, I'. Jolenc, 1.. Su. I.. I.awtoii, 
.vul A. tfstratiadis. 1994. Construction nnd 
i -Kuactorization or a normalized cl>NA library. VnK. Natl. 
A.\hl. Sci. 91: 922S-92:i2. 

^ ifir;i. J. aiKl J. Messing. 19S7. Production ol 
Ningk'-stranUed plasinid I).\*A. Methods Ett/.ynuti. 
153: :^-l 1. 



Kaciwd June N. aarpttfd in n-i /M-./ fortti Inly 29, 



806 V GENOME RESEARCH 



